Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 9. Characters and Strings 9 © The McGraw−Hill Companies, 2005 Text Characters and Strings O b j e c t i v e s After you have read and studied this chapter, you should be able to • Declare and manipulate data of the char type. string processing programs, using • Write String, StringBuilder, and StringBuffer objects. regular expressions for searching a • Specify pattern in a string. the String, StringBuilder, and • Differentiate StringBuffer classes and use the correct class in solving a given task. the difference between equality and • Tell equivalence testings for String objects. • Use the Pattern and Matcher classes. 487 Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 488 Chapter 9 9. Characters and Strings Text © The McGraw−Hill Companies, 2005 Characters and Strings I n t r o d u c t i o n E arly computers in the 1940s and 1950s were more like gigantic calculators because they were used primarily for numerical computation. However, as computers have evolved to possess more computational power, our use of computers is no longer limited to numerical computation. Today we use computers for processing information of diverse types. In fact, most application software today such as Web browsers, word processors, database management systems, presentation software, and graphics design software is not intended specifically for number crunching. These programs still perform numerical computation, but their primary data are text, graphics, video, and other nonnumerical data. We have already seen examples of nonnumerical data processing. We introduced the String class and string processing in Chapter 2. A nonnumerical data type called boolean was used in Chapters 5 and 6. In this chapter, we will delve more deeply into the String class and present advanced string processing. We will also introduce the char data type for representing a single character and the StringBuffer class for an efficient operation on a certain type of string processing. 9.1 Characters char ASCII In Java single characters are represented by using the data type char. Character constants are written as symbols enclosed in single quotes, for example, ‘a’, ‘X’, and ‘5’. Just as we use different formats to represent integers and real numbers using 0s and 1s in computer memory, we use special codes of 0s and 1s to represent single characters. For example, we may assign 1 to represent ‘A’ and 2 to represent ‘B’. We can assign codes similarly to lowercase letters, punctuation marks, digits, and other special symbols. In the early days of computing, different computers used not only different coding schemes but also different character sets. For example, one computer could represent the symbol 1⁄4, while other computers could not. Individualized coding schemes did not allow computers to share information. Documents created by using one scheme are complete gibberish if we try to read these documents by using another scheme. To avoid this problem, U.S. computer manufacturers devised several coding schemes. One of the coding schemes widely used today is ASCII (American Standard Code for Information Interchange). We pronounce ASCII “ăs kē.” Table 9.1 shows the 128 standard ASCII codes. Adding the row and column indexes gives you the ASCII code for a given character. For example, the value 87 is the ASCII code for the character ‘W’. Not all characters in the table are printable. ASCII codes 0 through 31 and 127 are nonprintable control characters. For example, ASCII code 7 is the bell (the computer beeps when you send this character to output), and code 9 is the tab. Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 9. Characters and Strings © The McGraw−Hill Companies, 2005 Text 9.1 489 Characters When we use a word processor to create a document, the file that contains the document includes not only the contents but also the formatting information. Since each software company uses its own coding scheme for storing this information, we have to use the same word processor to open the document. Often it is even worse. We cannot open a document created by a newer version of the same word processor with an older version. If we just want to exchange the text of a document, then we can convert it to ASCII format. Any word processor can open and save ASCII files. If we would like to retain the formatting information also, we can convert the document, using software such as Adobe Acrobat. This software converts a document (including text, formatting, images, etc.) created by different word processors to a format called PDF. Anybody with a free Acrobat Reader can open a PDF file. Many of the documents available from our website are in this PDF format. To represent all 128 ASCII codes, we need 7 bits ranging from 000 0000 (0) to 111 1111 (127). Although 7 bits is enough, ASCII codes occupy 1 byte (8 bits) because the byte is the smallest unit of memory you can access. Computer manufacturers use the extra bit for other nonstandard symbols (e.g., lines and boxes). Using 8 bits, we can represent 256 symbols in total—128 standard ASCII codes and 128 nonstandard symbols. Table Table 9.1 0 10 20 30 40 50 60 70 80 90 100 110 120 ASCII codes. 0 1 2 3 4 5 6 7 8 9 nul lf cd4 rs ( 2 < F P Z d n x soh vt nak us ) 3 = G Q [ e o y stx ff syn sp * 4 > H R \ f p z etx cr etb ! + 5 ? I S ] g q { eot so can " , 6 @ J T ^ h r | enq si em # 7 A K U _ i s } ack dle sub $ . 8 B L V ` j t ~ bel dc1 esc % / 9 C M W a k u del bs dc2 fs & 0 : D N X b l v ht dc3 gs ' 1 ; E O Y c m w Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 490 Chapter 9 Unicode 9. Characters and Strings Text © The McGraw−Hill Companies, 2005 Characters and Strings The standard ASCII codes work just fine as long as we are dealing with the English language because all letters and punctuation marks used in English are included in the ASCII codes. We cannot say the same for other languages. For languages such as French and German, the additional 128 codes may be used to represent character symbols not available in standard ASCII. But what about different currency symbols? What about non-European languages? Chinese, Japanese, and Korean all use different coding schemes to represent their character sets. Eight bits is not enough to represent thousands of ideographs. If we try to read Japanese characters by using ASCII, we will see only meaningless symbols. To accommodate the character symbols of non-English languages, the Unicode Consortium established the Unicode Worldwide Character Standard, commonly known simply as Unicode, to support the interchange, processing, and display of the written texts of diverse languages. The standard currently contains 34,168 distinct characters, which cover the major languages of the Americas, Europe, the Middle East, Africa, India, Asia, and Pacifica. To accommodate such a large number of distinct character symbols, Unicode characters occupy 2 bytes. Unicode codes for the character set shown in Table 9.1 are the same as ASCII codes. Java, being a language for the Internet, uses the Unicode standard for representing char constants. Although Java uses the Unicode standard internally to store characters, to use foreign characters for input and output in our programs, the operating system and the development tool we use for Java programs must be capable of handling the foreign characters. Characters are declared and used in a manner similar to data of other types. The declaration char ch1, ch2 = 'X'; declares two char variables ch1 and ch2 with ch2 initialized to ‘X’. We can display the ASCII code of a character by converting it to an integer. For example, we can execute JOptionPane.showMessageDialog("ASCII code of character X is " + (int)'X' ); Conversely, we can see a character by converting its ASCII code to the char data type, for example, JOptionPane.showMessageDialog( "Character with ASCII code 88 is " + (char)88 ); Because the characters have numerical ASCII values, we can compare characters just as we compare integers and real numbers. For example, the comparison 'A' < 'c' returns true because the ASCII value of ‘A’ is 65 while that of ‘c’ is 99. Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 9. Characters and Strings Text © The McGraw−Hill Companies, 2005 9.2 Strings 491 1. Determine the output of the following statements. a. b. c. d. System.out.println( (char) 65 ); System.out.println( (int) 'C' ); System.out.println( 'Y' ); if ( 'A' < '?' ) System.out.println( 'A' ); else System.out.println( '?' ); 2. How many distinct characters can you represent by using 8 bits? 9.2 Strings String A string is a sequence of characters that is treated as a single value. Instances of the String class are used to represent strings in Java. Rudimentary string processing was already presented in Chapter 2, using methods such as substring, length, and indexOf. In this section we will learn more advanced string processing, using other methods of the String class. To introduce additional methods of the String class, we will go through a number of common string processing routines. The first is to process a string looking for a certain character or characters. Let’s say we want to input a person’s name and determine the number of vowels that the name contains. The basic idea is very simple: for each character ch in the string { if (ch is a vowel) { increment the counter } } charAt There are two details we need to know before being able to translate that into actual code. First, we need to know how to refer to an individual character in the string. Second, we need to know how to determine the size of the string, that is, the number of characters the string contains, so we can write the boolean expression to stop the loop correctly. We know from Chapter 2 that the second task is done by using the length method. For the first task, we use charAt. We access individual characters of a string by calling the charAt method of the String object. For example, to display the individual characters of the string Sumatra one at a time, we can write String name = "Sumatra"; int size = name.length(); for (int i = 0; i < size; i++) { System.out.println(name.charAt(i)); } Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 492 Chapter 9 9. Characters and Strings © The McGraw−Hill Companies, 2005 Text Characters and Strings String name = "Sumatra"; 0 1 2 3 4 5 6 S u m a t r a name name.charAt(3) The variable refers to the whole string. The method returns the character at position 3. Figure 9.1 An indexed expression is used to refer to individual characters in a string. Each character in a string has an index that we use to access the character. We use zero-based indexing; that is, the first character has index 0, the second character has index 1, the third character has index 2, and so forth. To refer to the first character of name, for example, we say name.charAt(0) Since the characters are indexed from 0 to size-1, we could express the preceding for loop as for (int i = 0; i <= size - 1; i++) However, we will use the first style almost exclusively to be consistent. Figure 9.1 illustrates how the charAt method works. Notice that name refers to a String object, and we are calling its charAt method that returns a value of primitive data type char. Strictly speaking, we must say “name is a variable of type String whose value is a reference to an instance of String.” However, when the value of a variable X is a reference to an instance of class Y, we usually say “X is an instance of Y” or “X is a Y object.” If the value of a variable X is a reference to an object of class Y, then we say “ X is a Y object” or “X is an instance of Y.” Since String is a class, we can create an instance of a class by using the new method. The statements we have been using so far, such as String name1 = "Kona"; String name2; name2 = "Espresso"; Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 9. Characters and Strings © The McGraw−Hill Companies, 2005 Text 9.2 Strings 493 work as a shorthand for String name1 = new String("Kona"); String name2; name2 = new String("Espresso"); Be aware that this shorthand works for the String class only. Moreover, although the difference will not be critical in almost all situations, they are not exactly the same. We will discuss the subtle difference between the two in Section 9.5. Here is the code for counting the number of vowels: /* Chapter 9 Sample Program: Count the number of vowels in a given string File: Ch9CountVowels.java */ import javax.swing.*; class Ch9CountVowels { public static void main (String[] args) { String name; int numberOfCharacters, vowelCount = 0; char letter; name = JOptionPane.showInputDialog(null, "What is your name?"); numberOfCharacters = name.length(); for (int i = 0; i < numberOfCharacters; i++) { letter = name.charAt(i); if ( letter letter letter letter letter == == == == == 'a' 'e' 'i' 'o' 'u' || || || || || letter letter letter letter letter == == == == == 'A' 'E' 'I' 'O' 'U' || || || || ) { vowelCount++; } } JOptionPane.showMessageDialog(null, name + ", your name has " + vowelCount + " vowels"); } } Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 494 Chapter 9 9. Characters and Strings © The McGraw−Hill Companies, 2005 Text Characters and Strings We can shorten the boolean expression in the if statement by using the toUpperCase method of the String class. This method converts every character in a string to uppercase. Here’s the rewritten code: /* Chapter 9 Sample Program: Count the number of vowels in a given string using toUpperCase File: Ch9CountVowels2.java */ import javax.swing.*; class Ch9CountVowels2 { public static void main (String[] args) { String name, nameUpper; int numberOfCharacters, vowelCount = 0; char letter; name = JOptionPane.showInputDialog(null, "What is your name?"); numberOfCharacters = name.length(); nameUpper = name.toUpperCase(); for (int i = 0; i < numberOfCharacters; i++) { letter = nameUpper.charAt(i); if ( letter letter letter letter letter == == == == == 'A' 'E' 'I' 'O' 'U' || || || || ) { vowelCount++; } } JOptionPane.showMessageDialog(null, name + ", your name has " + vowelCount + " vowels"); } } Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 9. Characters and Strings Text © The McGraw−Hill Companies, 2005 9.2 toUpperCase Strings 495 Notice that the original string name is unchanged. A new, converted string is returned from the toUpperCase method and assigned to the second String variable nameUpper. Let’s try another example. This time we read in a string and count how many words the string contains. For this example we consider a word as a sequence of characters separated, or delimited, by blank spaces. We treat punctuation marks and other symbols as part of a word. Expressing the task in pseudocode, we have the following: read in a sentence; while (there are more characters in the sentence) { look for the beginning of the next word; now look for the end of this word; increment the word counter; } We use a while loop here instead of do–while to handle the case when the input sentence contains no characters, that is, when it is an empty string. Let’s implement the routine. Here’s our first attempt: //Attempt No. 1 n o i s r e V d Ba static final char BLANK = ' '; int index, wordCount, numberOfCharacters; String sentence = JOptionPane.showInputDialog(null, "Enter a sentence:"); numberOfCharacters = sentence.length(); index = 0; wordCount = 0; while (index < numberOfCharacters ) { //ignore blank spaces while (sentence.charAt(index) == BLANK) { index++; } Skip blank spaces until a character that is not a blank space is encountered.This is the beginning of a word. Once the beginning of a word is detected, we skip nonblank characters until a blank space is encountered.This is the end of the word. n o i s r e V d Ba //now locate the end of the word while (sentence.charAt(index) != BLANK) { index++; } //another word has been found, so increment the counter wordCount++; } Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 496 Chapter 9 9. Characters and Strings Text © The McGraw−Hill Companies, 2005 Characters and Strings This implementation has a problem. The counter variable index is incremented inside the two inner while loops, and this index could become equal to numberOfCharacters, which is an error, because the position of the last character is numberOfCharacters – 1. We need to modify the two while loops so that index will not become larger than numberOfCharacters –1. Here’s the modified code: /* Chapter 9 Sample Program: Count the number of words in a given string File: Ch9CountWords.java (Attempt 2) */ import javax.swing.*; class Ch9CountWords { //Attempt 2 private static final char BLANK = ' '; public static void main (String[] args) { int index, wordCount, numberOfCharacters; String sentence = JOptionPane.showInputDialog(null, "Enter a sentence:"); numberOfCharacters index wordCount = sentence.length( ); = 0; = 0; while ( index < numberOfCharacters ) { //ignore blank spaces while (index < numberOfCharacters && sentence.charAt(index) == BLANK) { index++; } //now locate the end of the word while (index < numberOfCharacters && sentence.charAt(index) != BLANK) { index++; } //another word is found, so increment the counter wordCount++; } Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 9. Characters and Strings Text © The McGraw−Hill Companies, 2005 9.2 Strings 497 //display the result System.out.println( "Input sentence: " + sentence ); System.out.println("\n"); System.out.println( " Word count: " + wordCount + " words" ); } } Notice that the order of comparisons in the boolean expression index < numberOfCharacters && sentence.charAt(index) == BLANK is critical. If we switch the order to sentence.charAt(index) == BLANK && index < numberOfCharacters out-of-bound exception and if the last character in the string is a space, then an out-of-bound exception will occur because the value of index is a position that does not exist in the string sentence. By putting the expression correctly as index < numberOfCharacters && sentence.charAt(index) != ' ' we will not get an out-of-bound exception because the boolean operator && is a shortcircuit operator. If the relation index < numberOfCharacters is false, then the second half of the expression sentence.charAT(index) != BLANK will not get evaluated. There is still a problem with the attempt 2 code. If the sentence ends with one or more blank spaces, then the value for wordCount will be 1 more than the actual number of words in the sentence. It is left as an exercise to correct this bug (see Exercise 15 at the end of the chapter). Our third example counts the number of times the word Java occurs in the input. The repetition stops when the word STOP is read. Lowercase and uppercase letters are not distinguished when an input word is compared to Java, but the word STOP for terminating the loop must be in all uppercase letters. Here’s the pseudocode: javaCount = 0; while (true) { read in next word; if (word is "STOP") { break; } else if (word is "Java" ignoring cases) { javaCount++; } } Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 498 Chapter 9 9. Characters and Strings © The McGraw−Hill Companies, 2005 Text Characters and Strings And here’s the actual code. Pay close attention to how the strings are compared. /* Chapter 9 Sample Program: Count the number of times the word 'java' occurs in input. Case-insensitive comparison is used here. The program terminates when the word STOP (case-sensitive) is entered. File: Ch9CountJava.java */ import javax.swing.*; class Ch9CountJava { public static void main (String[] args) { int javaCount String word; = 0; while (true) { word = JOptionPane.showInputDialog(null, "Next word:"); if ( word.equals("STOP") ) break; { } else if ( word.equalsIgnoreCase("Java") ) { javaCount++; } } System.out.println("'Java' count: " + javaCount ); } } compareTo String comparison is done by two methods—equals and equalsIgnoreCase— whose meanings should be clear from the example. Another comparison method is compareTo. This method compares two String objects str1 and str2 as in str1.compareTo( str2 ); and returns 0 if they are equal, a negative integer if str1 is less than str2, and a positive integer if str1 is greater than str2. The comparison is based on the lexicographic Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 9. Characters and Strings Text © The McGraw−Hill Companies, 2005 9.2 Strings 499 order of Unicode. For example, caffeine is less than latte. Also, the string jaVa is less than the string java because the Unicode value of V is smaller than the Unicode value of v. (See the ASCII table, Table 9.1.) Some of you may be wondering why we don’t say if ( word == "STOP" ) We can, in fact, use the equality comparison symbol == to compare two String objects, but the result is different from the result of the method equals. We will explain the difference in Section 9.5. Let’s try another example, using the substring method we introduced in Chapter 2. To refresh our memory, here’s how the method works. If str is a String object, then the expression str.substring ( beginIndex, endIndex ) returns a new string that is a substring of str from position beginIndex to endIndex – 1. The value of beginIndex must be between 0 and str.length() – 1, and the value of endIndex must be between 0 and str.length(). In addition, the value of beginIndex must be less than or equal to the value of endIndex. Passing invalid values for beginIndex or endIndex will result in a runtime error. The following code creates a new string Javanist from Alpinist by using the substring method. String oldWord = "Alpinist"; String newWord = "Java" + oldWord.substring(4,8); In this example, we print out the words from a given sentence, using one line per word. For example, given an input sentence I want to be a Java programmer the code will print out I want to be a Java programmer Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 500 Chapter 9 9. Characters and Strings Text © The McGraw−Hill Companies, 2005 Characters and Strings This sample code is similar to the previous one that counts the number of words in a given sentence. Instead of just counting the words, we need to extract the word from the sentence and print it out. Here’s how we write the code: /* Chapter 9 Sample Program: Extract the words in a given sentence and print them, using one line per word. File: Ch9ExtractWords.java */ import javax.swing.*; class Ch9ExtractWords { private static final char BLANK = ' '; public static void main (String[] args) { int index, numberOfCharacters, beginIdx, endIdx; String word, sentence = JOptionPane.showInputDialog(null, "Input:"); numberOfCharacters = sentence.length(); index = 0; while ( index < numberOfCharacters ) { //ignore leading blank spaces while (index < numberOfCharacters && sentence.charAt(index) == BLANK) { index++; } beginIdx = index; //now locate the end of the word while (index < numberOfCharacters && sentence.charAt(index) != BLANK) { index++; } endIdx = index; Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 9. Characters and Strings Text © The McGraw−Hill Companies, 2005 9.2 Strings 501 if (beginIdx != endIdx) { //another word is found, extract it from the //sentence and print it out word = sentence.substring( beginIdx, endIdx ); System.out.println(word); } } } } Notice the signficance of the test if (beginIdx != endIdx) in the code. For what kinds of input sentences will the variables beginIdx and endIdx be equal? We’ll leave this as an exercise (see Exercise 16 at the end of the chapter). 1. Determine the output of the following code. a. String str = "Programming"; for (int i = 0; i < 9; i+=2) { System.out.print( str.charAt( i ) ); } b. String str = "World Wide Web"; for (int i = 0; i < 10; i ++ ) } if ( str.charAt(i) == 'W') { System.out.println( 'M' ); } else { System.out.print( str.charAt(i) ); } } 2. Write a loop that prints out a string in reverse. If the string is Hello, then the code outputs olleH. Use System.out. 3. Assume two String objects str1 and str2 are initialized as follows: String str1 = "programming"; String str2 = "language"; Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 502 Chapter 9 9. Characters and Strings Text © The McGraw−Hill Companies, 2005 Characters and Strings Determine the value of each of the following expressions if they are valid. If they are not valid, state the reason why. a. b. c. d. e. f. str1.compareTo( str2 ) str2.compareTo( str2 ) str2.substring( 1, 1 ) str2.substring( 0, 7 ) str2.charAt( 11 ) str1.length( ) + str2.length( ) 4. What is the difference between the two String methods equals and equalsIgnoreCase? 9.3 Pattern Matching and Regular Expression pattern matching One sample code from Section 9.2 searched for the word Java in a given string. This sample code illustrated a very simplified version of a well-known problem called pattern matching. Word processor features such as finding a text and replacing a text with another text are two specialized cases of a pattern-matching problem. Because pattern matching is so common in many applications, from Java 2 SDK 1.4, two new classes—Pattern and Matcher—are added. The String class is also modifed to include several new methods that support pattern matching. The matches Method Let’s begin with the matches method from the String class. In its simplest form, it looks very similar to the equals method. For example, given a string str, the two statements str.equals("Hello"); str.matches("Hello"); both evaluate to true if str is the string Hello. However, they are not truly equivalent, because, unlike equals, the argument to the matches method can be a pattern, a feature that brings great flexibility and power to the matches method. Suppose we assign a three-digit code to all incoming students. The first digit represents the major, and 5 stands for the computer science major. The second digit represents the home state: 1 is for in-state students, 2 is for out-of-state students, and 3 is for foreign students. And the third digit represents the residence of the student. On-campus dormitories are represented by digits from 1 through 7. Students living off campus are represented by digit 8. For example, the valid encodings for students majoring in computer science and living off campus are 518, 528, and 538. The valid three-digit code for computer science majors living in one of the on-campus dormitories can be expressed succinctly as 5[123][1-7] Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 9. Characters and Strings © The McGraw−Hill Companies, 2005 Text 9.3 Pattern Matching and Regular Expression 503 and here’s how we interpret the pattern: first digit second digit third digit 5 [123] [1–7] It must be 5 for the computer science majors. It must be any digit from 1 to 7. It must be 1, 2, or 3. regular expression The pattern is called a regular expression that allows us to denote a large (often infinite) set of words succinctly. The “word” is composed of any sequence of symbols and is not limited to alphabets. The brackets [ ] are used here to represent choices, so [123] means 1, 2, or 3. We can use the notation for alphabets also. For example, [aBc] means a, B, or c. Notice the notation is case-sensitive. The hyphen in the brackets shows the range, so [1-7] means any digit from 1 to 7. If we want to allow any lowercase letter, then the regular expression will be [a-z]. The hat symbol ^ is used for negation. For example, [^abc] means any character except a, b, or c. Notice that this expression does not restrict the character to lowercase letters; it can be any character including digits and symbols. To refer to all lowercase letters except a, b, or c, the correct expression is [a-z&&[^abc]]. The double ampersand represents an intersection. Here are more examples: Expression Description [013] [0-9][0-9] A[0-4]b[05] A single digit 0, 1, or 3. Any two-digit number from 00 to 99. A string that consists of four characters.The first character is A. The second character is 0, 1, 2, 3, or 4. The third character is b. And the last character is either 0 or 5. A single digit that is 0, 1, 2, 3, 8, or 9. A single character that is either a lowercase letter or a digit. [0-9&&[^4567]] [a-z0-9] We can use repetition symbols * or + to designate a sequence of unbounded length. The symbol * means 0 or more times, and the symbol + means 1 or more times. Let’s try an example using a repetition symbol. Remember the definition for a valid Java identifier? We define it as a seqence of alphanumeric characters, underscores, and dollar signs, with the first character being an alphabet. In regular expression, we can state this definition as [a-zA-Z][a-zA-Z0-9_$]* Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 504 Chapter 9 9. Characters and Strings Text © The McGraw−Hill Companies, 2005 Characters and Strings Let’s write a short program that will input a word and determine whether it is a valid Java identifier. The program stops when the word entered is STOP. Here’s the program: /* Chapter 9 Sample Program: Checks whether the input string is a valid identifier. File: Ch9MatchJavaIdentifier.java */ import javax.swing.*; class Ch9MatchJavaIdentifier { private static final String STOP = "STOP"; private static final String VALID = "Valid Java identifier"; private static final String INVALID = "Not a valid Java identifier"; private static final String VALID_IDENTIFIER_PATTERN = "[a-zA-Z][a-zA-Z0-9_$]*"; public static void main (String[] args) { String str, reply; while (true) { str = JOptionPane.showInputDialog(null, "Identifier:"); if (str.equals(STOP)) break; if (str.matches(VALID_IDENTIFIER_PATTERN)) { reply = VALID; } else { reply = INVALID; } JOptionPane.showMessageDialog(null, str + ":\n" + reply); } } } It is also possible to designate a sequence of fixed length. For example, to specify four-digit numbers, we write [0-9]{4}. The number in the braces { and } denotes the number of repetitions. We can specify the minimum and maximum numbers of Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 9. Characters and Strings © The McGraw−Hill Companies, 2005 Text 9.3 Pattern Matching and Regular Expression 505 repetitions also. Here are the rules: Expression Description X{N} Repeat X exactly N times, where X is a regular expression for a single character. Repeat X at least N times. Repeat X at least N but no more than M times. X{N,} X{N,M} Here’s an example of using a sequence of fixed length. Suppose we want to determine whether the input string represents a valid phone number that follows the pattern of xxx-xxx-xxxx where x is a single digit from 0 through 9. The following is a program that inputs a string continually and replies whether the input string conforms to the pattern. The program terminates when a single digit 0 is entered. Structurally this program is identical to the Ch9MatchJavaIdentifier class. Here’s the program: /* Chapter 9 Sample Program: Checks whether the input string conforms to the phone number pattern xxx-xxx-xxxx. File: Ch9MatchPhoneNumber.java */ import javax.swing.*; class Ch9MatchPhoneNumber { private static final String STOP = "0"; private static final String VALID = "Valid phone number"; private static final String INVALID = "Not a valid phone number"; private static final String VALID_PHONE_PATTERN = "[0-9]{3}-[0-9]{3}-[0-9]{4}"; public static void main (String[] args) { String phoneStr, reply; while (true) { phoneStr = JOptionPane.showInputDialog(null, "Phone#:"); if (phoneStr.equals(STOP)) break; Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 506 Chapter 9 9. Characters and Strings © The McGraw−Hill Companies, 2005 Text Characters and Strings if (phoneStr.matches(VALID_PHONE_PATTERN)) { reply = VALID; } else { reply = INVALID; } JOptionPane.showMessageDialog(null, phoneStr + ":\n" + reply); } } } Suppose, with the proliferation of cell phones, the number of digits used for a prefix increases from three to four in major cities. (In fact, Tokyo now uses a fourdigit prefix. Phenomenal growth in the use of fax machines in both offices and homes caused the increase from three to four digits.) The valid format for phone numbers then becomes xxx-xxx-xxxx or xxx-xxxx-xxxx This change can be handled effortlessly by defining VALID_PHONE_PATTERN as private static final String VALID_PHONE_PATTERN = "[0-9]{3}-[0-9]{3,4}-[0-9]{4}"; This is the power of regular expression and pattern-matching methods. All we need to do is to make one simple adjustment to the regular expression. No other changes are made to the program. Had we written the program without using the pattern-matching technique (i.e., written the program using repetition control to test the first to the last character individually), changing the code to handle both a threedigit and a four-digit prefix requires substantially greater effort. The period symbol (.) is used to match any character except a line terminator such as \n or \r. (By using the Pattern class, we can make it match a line terminator also. We discuss more details on the Pattern class later.) We can use the period symbol with the zero-or-more-times notation * to check if a given string contains a sequence of characters we are looking for. For example, suppose a String object document holds the content of some document, and we want to check if the phrase “zen of objects” is in it. We can do it as follows: String document; document = ...; //assign text to 'document' if (document.matches(".*zen of objects.*") { System.out.println("Found"); Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 9. Characters and Strings © The McGraw−Hill Companies, 2005 Text 9.3 Pattern Matching and Regular Expression 507 } else { System.out.println("Not found"); } The brackets [ and ] are used for expressing a range of choices for a single character. If we need to express a range of choices for multiple characters, then we use the parentheses and the vertical bar. For example, if we search for the word maximum or minimum, we express the pattern as (max|min)imum Here are some more examples: Expression Description [wb](ad|eed) (pro|anti)-OOP (AZ|CA|CO)[0-9]{4} Matches wad, weed, bad, and beed. Matches pro-OOP and anti-OOP. Matches AZxxxx, CAxxxx, and COxxxx, where x is a single digit. The replaceAll Method The second method new to the version 1.4 String class is the replaceAll method. Using this method, we can replace all occurrences of a substring that matches a given regular expression with a given replacement string. For example, here’s how to replace all vowels in the string with the @ symbol: String originalText, modifiedText; originalText = ...; //assign string to 'originalText' modifiedText = originalText.replaceAll("[aeiou]", "@"); Notice the original text is unchanged. The replaceAll method returns a modified text as a separate string. Here are more examples: Expression Description str.replaceAll("OOP", "object-oriented programming") str.replaceAll( "[0-9]{3}-[0-9]{2}-[0-9]{4}", "xxx-xx-xxxx") str.replaceAll("o{2,}", "oo") Replace all occurrences of OOP with object-oriented programming. Replace all social security numbers with xxx-xx-xxxx. Replace all occurrences of a sequence that has two or more of letter o with oo. Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 508 Chapter 9 9. Characters and Strings © The McGraw−Hill Companies, 2005 Text Characters and Strings If we want to match only the whole word, we have to use the \b symbol to designate the word boundary. Suppose we write str.replaceAll("temp", "temporary"); expecting to replace all occurrences of the abbreviated word temp by temporary. We will get a surprising result. All occurrences of the sequence of characters temp will be replaced; so, for example, words such as attempt or tempting would be replaced by attemporaryt or temporaryting, respectively. To designate the sequence temp as a whole word, we place the word boundary symbol \b in the front and end of the sequence. str.replaceAll("\\btemp\\b", "temporary"); Notice the use of two backslashes. The symbol we use in the regular expression is \b. However, we must write this regular expression in a String representation. And remember that the backslash symbol in a string represents a control character such as \n, \t, and \r. To specify the regular expression symbol with a backslash, we must use additional backslash, so the system will not interpret it as some kind of control character. The regular expression we want here is \btemp\b To put it in a String representation, we write "\\btemp\\b" Here are the common backslash symbols used in regular expressions: Expression String Representation \d \D \s "\\d" "\\D" "\\s" \S \w "\\S" "\\w" \W \b "\\W" "\\b" \B "\\B" Description A single digit. Equivalent to [0-9]. A single nondigit. Equivalent to [^0-9]. A white space character, such as space, tab, new line, etc. A non-white-space character. A word character. Equivalent to [a-zA-Z_0-9]. A nonword character. A word boundary (such as a white space and punctuation mark). A nonword boundary. We also use the backslash if we want to search for a command character. For example, the plus symbol designates one or more repetitions. If we want to search for the plus symbol in the text, we use the backslash as \+ and to express it as a Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 9. Characters and Strings © The McGraw−Hill Companies, 2005 Text 9.4 The Pattern and Matcher Classes 509 string, we write “\\+”. Here’s an example. To replace all occurrences of C and C++ (not necessarily a whole word) with Java, we write str.replaceAll("(C|C\\+\\+)", "Java"); 1. Describe the string that the following regular expressions match. a. a*b b. b[aiu]d c. [Oo]bject(s| ) 2. Write a regular expression for a state vehicle license number whose format is a single capital letter, followed by three digits and four lowercase letters. 3. Which of the following regular expressions are invalid? a. b. c. d. e. (a-z)*+ [a|ab]xyz abe-14 [a-z&&^a^b] [//one]two 9.4 The Pattern and Matcher Classes The matches and replaceAll methods of the String class are shorthand for using the Pattern and Matcher classes from the java.util.regex package. We will describe how to use these two classes for more efficient pattern matching. The statement str.matches(regex); where str and regex are String objects is equivalent to Pattern.matches(regex, str); which in turn is equivalent to Pattern pattern = Pattern.compile(regex); Matcher matcher = pattern.matcher(str); matcher.matches(); Similarly, the statement str.replaceAll(regex, replacement); where replacement is a replacement text is equivalent to Pattern pattern = Pattern.compile(regex); Matcher matcher = pattern.matcher(str); matcher.replaceAll(replacement); Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 510 Chapter 9 9. Characters and Strings Text © The McGraw−Hill Companies, 2005 Characters and Strings Explicit creation of Pattern and Matcher objects gives us more options and greater efficiency. We specify regular expressions as strings, but for the system to actually carry out the pattern-matching operation, the stated regular expression must first be converted to an internal format. This is done by the compile method of the Pattern class. When we use the matches method of the String or Pattern class, this conversion into the internal format is carried out every time the matches method is executed. So if we use the same pattern multiple times, then it is more efficient to convert just once, instead of repeating the same conversion, as was the case for the Ch9MatchJavaIdentifier and Ch9MatchPhoneNumber classes. The following is Ch9MatchJavaIdentifier2, a more efficient version of Ch9MatchJavaIdentifier: /* Chapter 9 Sample Program: Checks whether the input string is a valid identifier. This version uses the Matcher and Pattern classes. File: Ch9MatchJavaIdentifier2.java */ import javax.swing.*; import java.util.regex.*; class Ch9MatchJavaIdentifier2 { private static final String STOP = "STOP"; private static final String VALID = "Valid Java identifier"; private static final String INVALID = "Not a valid Java identifier"; private static final String VALID_IDENTIFIER_PATTERN = "[a-zA-Z][a-zA-Z0-9_$]*"; public static void main (String[] args) { String Matcher Pattern str, reply; matcher; pattern = Pattern.compile(VALID_IDENTIFIER_PATTERN); while (true) { str = JOptionPane.showInputDialog(null, "Identifier:"); if (str.equals(STOP)) break; matcher = pattern.matcher(str); if (matcher.matches()) { reply = VALID; } else { reply = INVALID; } Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 9. Characters and Strings © The McGraw−Hill Companies, 2005 Text 9.4 The Pattern and Matcher Classes 511 JOptionPane.showMessageDialog(null, str + ":\n" + reply); } } } We have a number of options when the Pattern compiles into an internal format. For example, by default, the period symbol does not match the line terminator character. We can override this default by passing DOTALL as the second argument as Pattern pattern = Pattern.compile(regex, Pattern.DOTALL); To enable case-insensitive matching, we pass the CASE_INSENSITIVE constant. The find method is another powerful method of the Matcher class. This method searches for the next sequence in a string that matches the pattern. The method returns true if the patten is found. We can call the method repeatedly until it returns false to find all matches. Here’s an example that counts the number of times the word java occurs in a given document. We will search for the word in a case-insensitive manner. /* Chapter 9 Sample Program: Count the number of times the word 'java' occurs in input using pattern-matching technique. The program terminates when the word STOP (case-sensitive) is entered. File: Ch9PMCountJava.java */ import javax.swing.*; import java.util.regex.*; class Ch9PMCountJava { public static void main (String[] args) { String int document; javaCount; Matcher Pattern matcher; pattern = Pattern.compile("java", Pattern.CASE_INSENSITIVE); document = JOptionPane.showInputDialog(null, "Sentence:"); javaCount = 0; matcher = pattern.matcher(document); Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 512 Chapter 9 9. Characters and Strings Text © The McGraw−Hill Companies, 2005 Characters and Strings while (matcher.find()) { javaCount++; } JOptionPane.showMessageDialog(null, "The word 'java' occurred " + javaCount + " times."); } } When a matcher finds a matching sequence of characters, we can query the location of the sequence by using the start and end methods. The start method returns the position in the string where the first character of the pattern is found, and the end method returns the value 1 more than the position in the string where the last character of the pattern is found. Here’s the code that prints out the matching sequences and their locations in the string when searching for the word java in a case-insensitive manner. /* Chapter 9 Sample Program: Displays the positions the word 'java' occurs in a given string using pattern-matching technique. The program terminates when the word STOP (case-sensitive) is entered. File: Ch9PMCountJava2.java */ import javax.swing.*; import java.util.regex.*; class Ch9PMCountJava2 { public static void main (String[] args) { String int document; javaCount; Matcher Pattern matcher; pattern = Pattern.compile("java", Pattern.CASE_INSENSITIVE); document = JOptionPane.showInputDialog(null, "Sentence:"); javaCount = 0; matcher = pattern.matcher(document); Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 9. Characters and Strings © The McGraw−Hill Companies, 2005 Text 9.5 Comparing Strings 513 while (matcher.find()) { System.out.println(document.substring(matcher.start(), matcher.end()) + " found at position " + matcher.start()); } } } 1. Replace the following statements with the equivalent ones using the Pattern and Matcher classes. a. str.replaceAll("1", "one"); b. str.matches("alpha"); 2. Using the find method of the Matcher class, check if the given string document contains the whole word Java. 9.5 Comparing Strings We already discussed how objects are compared in Chapter 5. The same rule applies for the string, but we have to be careful in certain situations because of the difference in the way a new String object is created. First, we will review how the objects are compared. The difference between String word1, word2; ... if ( word1 == word2 ) ... ⫽⫽ versus equals and if ( word1.equals(word2) ) ... equivalence test is illustrated in Figure 9.2. The equality test == is true if the contents of variables are the same. For a primitive data type, the contents are values themselves; but for a reference data type, the contents are addresses. So for a reference data type, the equality test is true if both variables refer to the same object, because they both contain the same address. The equals method, on the other hand, is true if the String objects to which the two variables refer contain the same string value. To distinguish the two types of comparisons, we will use the term equivalence test for the equals method. Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 514 Chapter 9 9. Characters and Strings © The McGraw−Hill Companies, 2005 Text Characters and Strings Case A: Referring to the same object. word1 word2 :String Java word1 == word2 is true word1.equals( word2 ) is true Note: If x ⴝⴝ y is true, then x.equals(y) is also true. The reverse is not always true. Case B: Referring to different objects having identical string values. word1 word2 :String :String Java Java word1 == word2 is false word1.equals( word2 ) is true Case C: Referring to different objects having different string values. word1 word2 :String :String Bali Java word1 == word2 is false word1.equals( word2 ) is false Figure 9.2 The difference between the equality test and the equals method. As long as we create a new String object as String str = new String("Java"); using the new operator, the rule for comparing objects applies to comparing strings. However, when the new operator is not used, for example, in String str = "Java"; we have to be careful. Figure 9.3 shows the difference in assigning a String object to a variable. If we do not use the new operator, then string data are treated as if they Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 9. Characters and Strings © The McGraw−Hill Companies, 2005 Text 9.6 String word1, word2; StringBuffer and StringBuilder word1 515 word2 word1 = new String("Java"); word2 = new String("Java"); Whenever the new operator is used, there will be a new object. String word1, word2; :String :String Java Java word1 word2 word1 = "Java"; word2 = "Java"; :String Literal string constant such as “Java” will always refer to the one object. Java Figure 9.3 Difference between using and not using the new operator for String. are primitive data type. When we use the same literal String constants in a program, there will be exactly one String object. 1. Show the state of memory after the following statements are executed. String str1 = str2 = str3 = str2 = str1, str2, str3; "Jasmine"; "Oolong"; str2; str1; 9.6 StringBuffer and StringBuilder A String object is immutable, which means that once a String object is created, we cannot change it. In other words, we can read individual characters in a string, but we cannot add, delete, or modify characters of a String object. Remember that the methods of the String class, such as replaceAll and substring, do not modify the original string; they return a new string. Java adopts this immutability restriction to implement an efficient memory allocation scheme for managing String objects. The immutability is the reason why we can treat the string data much as a primitive data type. Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 516 Chapter 9 string manipulation 9. Characters and Strings Text © The McGraw−Hill Companies, 2005 Characters and Strings Creating a new string from the old one will work for most cases, but sometimes manipulating the content of a string directly is more convenient. When we need to compose a long string from a number of words, for example, being able to manipulate the content of a string directly is much more convenient than creating a new copy of a string. String manipulation here means operations such as replacing a character, appending a string with another string, deleting a portion of a string, and so forth. If we need to manipulate the content of a string directly, we must use either the StringBuffer or the StringBuilder class. Here’s a simple example of modifying the string Java to Diva using a StringBuffer object: StringBuffer word = new StringBuffer( "Java" ); word.setCharAt(0, 'D'); word.setCharAt(1, 'i'); StringBuffer Notice that no new string is created, the original string Java is modified. Also, we must use the new method to create a StringBuffer object. The StringBuffer and StringBuilder classes behave exactly the same (i.e., they support the same set of public methods), but the StringBuilder class in general has a better performance. The StringBuilder class is new to Java 2 SDK version 1.5, so it cannot be used with the older versions of Java SDK. There are advanced cases where you have to use the StringBuffer class, but for the sample string processing programs in this book, we can use either one of them. Of course, to use the StringBuilder class, we must be using version 1.5 SDK. We can also continue to use the StringBuffer class with version 1.5. Because the StringBuffer class can be used with all versions of Java SDK, and the string processing performance in not our major concern here, we will be using the StringBuffer class exclusively in this book. If the string processing performance is a concern, then all we have to do is to replace all occurrences of the word StringBuffer to StringBuilder in the program and run it with version 1.5 SDK. Let’s look at some examples using StringBuffer objects. The first example reads a sentence and replaces all vowels in the sentence with the character X. /* Chapter 9 Sample Program: Replace every vowel in a given sentence with 'X' using StringBuffer. File: Ch9ReplaceVowelsWithX.java */ import javax.swing.*; class Ch9ReplaceVowelsWithX { public static void main (String[] args) { StringBuffer String tempStringBuffer; inSentence; Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 9. Characters and Strings © The McGraw−Hill Companies, 2005 Text 9.6 StringBuffer and StringBuilder 517 numberOfCharacters; letter; int char inSentence = JOptionPane.showInputDialog(null, "Enter a sentence:"); tempStringBuffer = new StringBuffer(inSentence); numberOfCharacters = tempStringBuffer.length(); for (int index = 0; index < numberOfCharacters; index++) { letter = tempStringBuffer.charAt(index); if ( letter letter letter letter letter == == == == == 'a' 'e' 'i' 'o' 'u' || || || || || letter letter letter letter letter == == == == == 'A' 'E' 'I' 'O' 'U' || || || || ) { tempStringBuffer.setCharAt(index,'X'); } } System.out.println( "Input: " + inSentence + "\n"); System.out.println( "Output: " + tempStringBuffer ); } } Notice how the input routine is done. We are reading in a String object and converting it to a StringBuffer object, because we cannot simply assign a String object to a StringBuffer variable. For example, the following code is invalid: n o i s r e Bad V StringBuffer strBuffer = JOptionPane.showInputDialog(null, "Enter a sentence:"); We are required to create a StringBuffer object from a String object as in String str = "Hello"; StringBuffer strBuf = new StringBuffer( str ); We cannot input StringBuffer objects.We have to input String objects and convert them to StringBuffer objects. Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 518 Chapter 9 9. Characters and Strings © The McGraw−Hill Companies, 2005 Text Characters and Strings Our next example constructs a new sentence from input words that have an even number of letters. The program stops when the word STOP is read. Let’s begin with the pseudocode: set tempStringBuffer to empty string; repeat = true; while ( repeat ) { read in next word; if (word is "STOP") { repeat = false; } else if (word has even number of letters) { append word to tempStringBuffer; } } And here’s the actual code: /* Chapter 9 Sample Program: Constructs a new sentence from input words that have an even number of letters. File: Ch9EvenLetterWords.java */ import javax.swing.*; class Ch9EvenLetterWords { public static void main (String[] args) { boolean repeat = true; String word; Create StringBuffer object with an empty string. StringBuffer tempStringBuffer = new StringBuffer(""); while ( repeat ) { word = JOptionPane.showInputDialog(null, "Next word:"); if ( word.equals("STOP") ) { repeat = false; } else if ( word.length() % 2 == 0 ) { tempStringBuffer.append(word + " "); } } Append word and a space to tempStringBuffer. Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 9. Characters and Strings © The McGraw−Hill Companies, 2005 Text 9.6 StringBuffer and StringBuilder 519 System.out.println( "Output: " + tempStringBuffer ); } } We use the append method to append a String or a StringBuffer object to the end of a StringBuffer object. The method append also can take an argument of the primitive data type. For example, all the following statements are valid: int float char i = 12; x = 12.4f; ch = 'W'; StringBuffer str = new StringBuffer(""); str.append(i); str.append(x); str.append(ch); Any primitive data type argument is converted to a string before it is appended to a StringBuffer object. Notice that we can write the second example using only String objects. Here’s how: boolean repeat = true; String word, newSentence; newSentence = ""; //empty string while ( repeat ) { word = JOptionPane.showInputDialog(null, "Next word:"); if ( word.equals("STOP") ) repeat = false; else if ( word.length() % 2 == 0 ) newSentence = newSentence + word; //string concatenation } Although this code does not explicitly use any StringBuffer object, the Java compiler may use StringBuffer when compiling the string concatenation operator. For example, the expression newSentence + word can be compiled as if the expression were new StringBuffer().append(word).toString() Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 520 Chapter 9 9. Characters and Strings Text © The McGraw−Hill Companies, 2005 Characters and Strings Using the append method of StringBuffer is preferable to using the string concatenation operator + because we can avoid creating temporary string objects by using StringBuffer. In addition to appending a string at the end of StringBuffer, we can insert a string at a specified position by using the insert method. The syntax for this method is <StringBuffer> . insert ( <insertIndex>, <value> ) ; where <insertIndex> must be greater than or equal to 0 and less than or equal to the length of <StringBuffer> and the <value> is an object or a value of the primitive data type. For example, to change the string Java is great to Java is really great we can execute StringBuffer str = new StringBuffer("Java is great"); str.insert(8, "really "); 1. Determine the value of str after the following statements are executed. a. StringBuffer str = new StringBuffer( "Caffeine" ); str.insert(0, "Dr. "); b. String str = "Caffeine"; StringBuffer str1 = new StringBuffer( str.substring(1, 3) ); str1.append('e'); str = "De" + str1; c. String str = "Caffeine"; StringBuffer str = new StringBuffer( str.substring(4, 8); str1.insert (3,'f'); str = "De" + str1 2. Assume a String object str is assigned as a string value. Write a code segment to replace all occurrences of lowercase vowels in a given string to the letter C by using String and StringBuffer objects. 3. Find the errors in the following code. String str = "Caffeine"; StringBuffer str1 = str.substring(1, 3); str1.append('e'); System.out(str1); str1 = str1 + str; Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 9. Characters and Strings © The McGraw−Hill Companies, 2005 Text 9.7 9.7 Sample Development 521 Sample Development Sample Development Building Word Concordance word concordance One technique to analyze a historical document or literature is to track word occurrences. A basic form of word concordance is a list of all words in a document and the number of times each word appears in the document. Word concordance is useful in revealing the writing style of an author. For example, given a word concordance of a document, we can scan the list and count the numbers of nouns, verbs, prepositions, and so forth. If the ratios of these grammatical elements differ significantly between the two documents, there is a high probability that they are not written by the same person. Another application of word concordance is seen in the indexing of a document, which, for each word, lists the page numbers or line numbers where it appears in the document. In this sample development, we will build a word concordance of a given document, utilizing the string-processing technique we learned in this chapter. One of the most popular search engine websites on the Internet today is Google (www.google.com). At the core of their innovative technology is a concordance of all Web pages on the Internet. Every month the company’s Web crawler software visits 3 billion (and steadily growing) Web pages, and from these visits, a concordance is built. When the user enters a query, the Google servers search the concordance for a list of matching Web pages and return the list in the order of relevance. Problem Statement Write an application that will build a word concordance of a document.The output from the application is an alphabetical list of all words in the given document and the number of times they occur in the document. The documents are a text file (contents of the file are ASCII characters), and the output of the program is saved as an ASCII file also. Overall Plan As usual, let’s begin the program development by first identifying the major tasks of the program. The first task is to get a text document from a designated file. We will use a helper class called FileManager to do this task. File processing techniques to implement the FileManager class will be presented in Chapter 12.The whole content of an ASCII file is represented in the program as a single String object. Using a pattern-matching technique, we extract individual words from the document. For each distinct word in the document, we associate a counter and increment it every time the word is repeated. We will use the second helper class called WordList for maintaining a word list.An entry in this list has two components—a word and how many times this word occurs in the document. A Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 522 Chapter 9 9.7 9. Characters and Strings © The McGraw−Hill Companies, 2005 Text Characters and Strings Sample Development—continued WordList object can handle an unbounded number of entries. Entries in the list are arranged in alphabetical order. We will learn how to implement the WordList class in Chapter 10. We can express the program logic in pseudocode as program tasks while ( the user wants to process another file ) { Task 1: read the file; Task 2: build the word list; Task 3: save the word list to a file; } Let’s look at the three tasks and determine objects that will be responsible for handling the tasks. For the first task, we will use the helper class FileManager. For the second task of building a word list, we will define the Ch9WordConcordance class, whose instance will use the Pattern and Matcher classes for word extraction, and another helper class WordList for maintaining the word list. The last task of saving the result is done by the FileManager class also. Finally, we will define a top-level control object that manages all other objects. We will call this class Ch9WordConcordanceMain. This will be our instantiable main class. Here’s our working design document: program classes Design Document: Ch9WordConcordanceMain Class Purpose Ch9WordConcordanceMain The instantiable main class of the program that implements the top-level program control. The key class of the program. An instance of this class manages other objects to build the word list. A helper class for opening a file and saving the result to a file. Details of this class can be found in Chapter 12. Another helper class for maintaining a word list. Details of this class can be found in Chapter 10. Classes for pattern-matching operations. Ch9WordConcordance FileManager WordList Pattern/Matcher Figure 9.4 is the working program diagram. Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 9. Characters and Strings © The McGraw−Hill Companies, 2005 Text 9.7 Sample Development WordList FileManager Ch9Word Concordance Pattern 523 Ch9Word ConcordanceMain Matcher A helper class provided to us A class we implement System classes Figure 9.4 The program diagram for the Ch9WordConcordanceMain program. Base system classes such as String and JOptionPane are not shown. In lieu of the Pattern and Matcher classes, we could use the StringTokenizer class. This class is fairly straightforward to use if the white space (tab, return, blank, etc.) is a word delimiter. However, using this class becomes a little more complicated if we need to include punctuation marks and others as a word delimiter also. Overall, the Pattern and Matcher classes are more powerful and useful in many types of applications than the StringTokenizer class. We will implement this program in four major steps: development steps 1. Start with a program skeleton. Define the main class with data members.To test the main class, we will also define a skeleton Ch9WordConcordance class with just a default constructor. 2. Add code to open a file and save the result. Extend the step 1 classes as necessary. 3. Complete the implementation of the Ch9WordConcordance class. 4. Finalize the code by removing temporary statements and tying up loose ends. Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 524 Chapter 9 9.7 9. Characters and Strings © The McGraw−Hill Companies, 2005 Text Characters and Strings Sample Development—continued Step 1 Development: Skeleton The design of Ch9WordConcordanceMain is straightforward, as its structure is very similar to that of other main classes. We will make this an instantiable main class and define the start method that implements the top-level control logic. We will define a default constructor to create instances of other classes. A skeleton Ch9WordConcordance class is also defined in this step so we can compile and run the main class. The skeleton Ch9WordConcordance class only has an empty default constructor. The working design document for the Ch9WordConcordanceMain class is as follows: step 1 design Design Document: The Ch9WordConcordanceMain Class step 1 code Method Visibility Purpose <constructor> public start private Creates the instances of other classes in the program. Implements the top-level control logic of the program. For the skeleton, the start method loops (doing nothing inside the loop in this step) until the user selects No on the confirmation dialog. Here’s the skeleton: /* Chapter 9 Sample Development: Word Concordance File: Step1/Ch9WordConcordanceMain.java */ import javax.swing.*; class Ch9WordConcordanceMain { private FileManager fileManager; private Ch9WordConcordance builder; //------------------------------// Main method //------------------------------public static void main(String[] args) { Ch9WordConcordanceMain main = new Ch9WordConcordanceMain(); main.start(); } Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 9. Characters and Strings © The McGraw−Hill Companies, 2005 Text 9.7 Sample Development 525 public Ch9WordConcordanceMain() { fileManager = new FileManager( ); builder = new Ch9WordConcordance( ); } private void start( ) { int reply; while (true) { reply = JOptionPane.showConfirmDialog(null, "Run the program?", "Word List Builder", JOptionPane.YES_NO_OPTION); if (reply == JOptionPane.NO_OPTION) { break; } } JOptionPane.showMessageDialog(null, "Thank you for using the program\n" + "Good-Bye"); } } The skeleton Ch9WordConcordance class has only an empty default constructor. Here’s the skeleton class: class Ch9WordConcordance { public Ch9WordConcordance() { } } step 1 test We run the program and verify that the constructor is executed correctly, and the repetition control in the start method works as expected. Step 2 Development: Open and Save Files step 2 design In the second development step, we add routines to handle input and output. The tasks of opening and saving a file are delegated to the service class FileManager. We will Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 526 9.7 Chapter 9 9. Characters and Strings Text © The McGraw−Hill Companies, 2005 Characters and Strings Sample Development—continued learn the implementation details of the FileManager class in Chapter 12. Our responsibility right now is to use the class correctly. The class provides two key methods: one to open a file and another to save a file. So that we can create and view the content easily, the FileManager class deals only with text files. To open a text file, we call its openFile method. There are two versions. With the first version, we pass the filename. For example, the code FileManager fm = new FileManager(); String doc = ...; //assign string data fm.saveFile("output1.txt", doc); will save the string data doc to a file named output1.txt. With the second version, we will let the end user select a file, using the standard file dialog. A sample file dialog is shown in Figure 9.5. With the second version, we pass only the string data to be saved as fm.saveFile(doc); When there’s an error in saving a file, an IOException is thrown. To open a text file, we use one of the two versions of the openFile method. The distinction is identical to the one for the saveFile methods. The first version requires the filename to open. The second version allows the end user to select a file to save the data, so we pass no parameter. The openFile method will throw a FileNotFoundException when the designated file cannot be found and an IOException when the designated file cannot be opened correctly. Figure 9.5 A sample file dialog for opening a file. Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 9. Characters and Strings © The McGraw−Hill Companies, 2005 Text 9.7 Sample Development 527 Here’s the summary of the FileManager class: Public Methods of FileManager public String openFile(String filename) throws FileNotFoundException, IOException Opens the text file filename and returns the content as a String. public String openFile( ) throws FileNotFoundException, IOException Opens the text file selected by the end user, using the standard file open dialog, and returns the content as a String. public String saveFile(String filename, String data) throws IOException Save the string data to filename. public String saveFile(String data) throws IOException Saves the string data to a file selected by the end user, using the standard file save dialog. We modify the start method to open a file, create a word concordance, and then save the generated word concordance to a file. The method is defined as follows: private void start( ) { int reply; String document, wordList; while (true) { reply = ...; //confirmation dialog reply if (reply == JOptionPane.NO_OPTION) { break; } document = inputFile(); //open file wordList = build(document); //build concordance Added portion saveFile(wordList); //save the generated concordance } ... //'Good-bye' message dialog } The inputFile method is defined as follows: private String inputFile( ) { String doc = ""; try { doc = fileManager.openFile( ); Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 528 9.7 Chapter 9 9. Characters and Strings Text © The McGraw−Hill Companies, 2005 Characters and Strings Sample Development—continued } catch (FileNotFoundException e) { System.out.println("File not found."); } catch (IOException e) { System.out.println("Error in opening file: " + e.getMessage()); } System.out.println("Input Document:\n" + doc); //TEMP return doc; } with a temporary output to verify the input routine. Because the openFile method of FileManager throws exceptions, we handle them here with the try-catch block. The saveFile method is defined as follows: private void saveFile(String list) { try { fileManager.saveFile(list); } catch (IOException e) { System.out.println("Error in saving file: " + e.getMessage()); } } The method is very simple as the hard work of actually saving the text data is done by our FileManager helper object. Finally, the build method is defined as private String build(String document) { String concordance; concordance = builder.build(document); return concordance; } The Ch9WordConcordanceMain class is now complete. To run and test this class, we will define a stub build method for the Ch9WordConcordance class. The method is temporarily defined as public String build(String document) { //TEMP String list = "one 14\ntwo 3\nthree 3\nfour 5\nfive 92\n"; Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 9. Characters and Strings © The McGraw−Hill Companies, 2005 Text 9.7 Sample Development 529 return list; //TEMP } step 2 code We will implement the method fully in the next step. Here’s the final Ch9WordConcordanceMain class: /* Chapter 9 Sample Development: Word Concordance File: Step2/Ch9WordConcordanceMain.java */ import java.io.*; import javax.swing.*; class Ch9WordConcordanceMain { ... private String build(String document) { build String concordance; concordance = builder.build(document); return concordance; } private String inputFile( ) { String doc = ""; inputFile try { doc = fileManager.openFile( ); } catch (FileNotFoundException e) { System.out.println("File not found."); } catch (IOException e) { System.out.println("Error in opening file: " + e.getMessage()); } System.out.println("Input Document:\n" + doc); //TEMP return doc; } private void saveFile(String list) { try { fileManager.saveFile(list); saveFile Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 530 Chapter 9 9.7 9. Characters and Strings Text © The McGraw−Hill Companies, 2005 Characters and Strings Sample Development—continued } catch (IOException e) { System.out.println("Error in saving file: " + e.getMessage()); } } private void start( ) { while (true) { ... document = inputFile(); start wordList = build(document); saveFile(wordList); } ... } } The temporary Ch9WordConcordance class now has the stub build method: class Ch9WordConcordance { ... public String build(String document) { //TEMP String list = "one 14\ntwo 3\nthree 3\nfour 5\nfive 92\n"; return list; //TEMP } } step 2 test We are ready to run the program. The step 2 directory contains several sample input files. We will open them and verify the file contents are read correctly by checking the temporary echo print output to System.out. To verify the output routine, we save to the output (the temporary output created by the build method of Ch9WordConcordance) and verify its content. Since the output is a text file, we can use any Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 9. Characters and Strings © The McGraw−Hill Companies, 2005 Text 9.7 Sample Development 531 word processor or text editor to view its contents. (Note: If we use NotePad on the Windows platform to view the file, it may not appear correctly. See the box below on how to avoid this problem.) The control characters used for a line separator are not the same for each platform (Windows, Mac, Unix, etc.) . One platform may use \n for a line separator while another platform may use \r\n for a line separator. Even on the same platform, different software may not interpret the control characters in the same way. To make our Java code work correctly across all platforms, we do, for example, String newline = System.getProperties().getProperty("line.separator"); String output = "line 1" + newline + "line 2" + newline; instead of String output = "line 1\nline 2\n"; Step 3 Development: Generate Word Concordance step 3 design In the third development step, we finish the program by implementing the Ch9WordConcordance class, specifically, its build method. Since we are using another helper class in this step, first we must find out how to use this helper class. The WordList class supports the maintenance of a word list. Every time we extract a new word from the document, we enter this word into a word list. If the word is already in the list, its count is incremented by 1. If the word occurs for the first time in the document, then the word is added to the list with its count initialized to 1. When we are done processing the document, we can get the word concordance from a WordList by calling its getConcordance method. The method returns the list as a single String with each line containing a word and its count in the following format: 2 1 1 2 1 1 1 7 1 2 1 Chapter Early However In already also an and are as because Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 532 9.7 Chapter 9 9. Characters and Strings Text © The McGraw−Hill Companies, 2005 Characters and Strings Sample Development—continued Because a single WordList object handles multiple documents, there’s a method called reset to clear the word list before processing the next document. Here’s the method summary: Public Methods of WordList public void add(String word) Increments the count for the given word. If the word is already in the list, its count is incremented by 1. If the word does not exist in the list, then it is added to the list with its count set to 1. public String getConcordance( ) Returns the word concordance in alphabetical order of words as a single string. Each line consists of a word and its count. public void reset( ) Clears the internal data structure so a new word list can be constructed.This method must be called every time before a new document is processed. The general idea behind the build method of the Ch9WordConcordance class is straightforward. We need to keep extracting a word from the document, and for every word found, we add it to the word list. Expressed in pseudocode, we have while (document has more words) { word = next word in the document; wordList.add(word); } String concordance = wordList.getConcordance(); The most difficult part here is how to extract words from a document. We can write our own homemade routine to extract words, based on the technique presented in Section 9.2. However, this is too much work to get the task done. Writing a code that detects various kinds of word terminators (in addition to space, punctuation mark, control characters such as tab, new line, etc., all satisfy as the word terminator) is not that easy. Conceptually, it is not that hard, but it can be quite tedious to iron out all the details. Instead, we can use the pattern-matching technique provided by the Pattern and Matcher classes for a reliable and efficient solution. The pattern for finding a word can be stated in a regular expression as \b\w+\b Putting it in a string format results in "\\b\\w+\\b" Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 9. Characters and Strings © The McGraw−Hill Companies, 2005 Text 9.7 Sample Development 533 The Pattern and Matcher objects are thus created as Pattern pattern = Pattern.compile("\\b\\w+\\b"); Matcher matcher = pattern.matcher(document); and the control loop to find and extract words is wordList.reset(); while (matcher.find( )) { wordList.add(document.substring(matcher.start(), matcher.end())); } step 3 code Here’s the final Ch9WordConcordance class: /* Chapter 9 Sample Development: Word Concordance File: Step3/Ch9WordConcordance.java */ import java.util.regex.*; class Ch9WordConcordance { private static final String WORD = "\\b\\w+\\b"; private WordList wordList; private Pattern pattern; public Ch9WordConcordance() { wordList = new WordList(); pattern = Pattern.compile(WORD); //pattern is compiled only once } public String build(String document) { build Matcher matcher = pattern.matcher(document); wordList.reset(); while (matcher.find()) { wordList.add(document.substring(matcher.start(), matcher.end())); } return wordList.getConcordance(); } } Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 534 Chapter 9 9.7 9. Characters and Strings Text © The McGraw−Hill Companies, 2005 Characters and Strings Sample Development—continued step 3 test Notice how short the class is, thanks to the power of pattern matching and the helper WordList class. We run the program against varying types of input text files. We can use a long document such as the term paper for the last term’s economy class (don’t forget to save it as a text file before testing). We should also use some specially created files for testing purposes. One file may contain only one word repeated 7 times, for example. Another file may contain no words at all. We verify that the program works correctly for all types of input files. Step 4 Development: Finalize program review As always, we finalize the program in the last step.We perform a critical review to find any inconsistency or error in the methods, any incomplete methods, places to add more comments, and so forth. In addition, we may consider possible extensions. One is an integrated user interface where the end user can view both the input document files and the output word list files. Another is the generation of different types of list. In the sample development, we count the number of occurrences of each word. Instead, we can generate a list of positions where each word appears in the document. The WordList class itself needs to be modified for such extension. S u m m a r y • • • • • • • • • The char data type represents a single character. The char constant is denoted by a single quotation mark, for example, ‘a’. The character coding scheme used widely today is ASCII (American Standard Code for Information Exchange). Java uses Unicode, which is capable of representing characters of diverse languages. ASCII is compatible with Unicode. A string is a sequence of characters, and in Java, strings are represented by String objects. The Pattern and Matcher classes are introduced in Java 2 SDK 1.4. They provide support for pattern-matching applications. Regular expression is used to represent a pattern to match (search) in a given text. The String objects are immutable. Once they are created, they cannot be changed. To manipulate mutable strings, use StringBuffer. Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 9. Characters and Strings © The McGraw−Hill Companies, 2005 Text Exercises • 535 Strings are objects in Java, and the rules for comparing objects apply when comparing strings. Only one String object is created for the same literal String constants. The standard classes described or used in this chapter are • • String Pattern StringBuffer Matcher StringBuilder K e y C o n c e p t s characters strings string processing regular expression pattern matching character encoding String comparison E x e r c i s e s 1. What is the difference between ‘a’ and “a”? 2. Discuss the difference between str = str + word; //string concatenation and tempStringBuffer.append(word) where str is a String object and tempStringBuffer is a StringBuffer object. 3. Show that if x and y are String objects and x == y is true, then x.equals(y) is also true, but the reverse is not necessarily true. 4. What will be the output from the following code? StringBuffer word1, word2; word1 = new StringBuffer("Lisa"); word2 = word1; word2.insert(0, "Mona "); System.out.println(word1); 5. Show the state of memory after the execution of each statement in the following code. String word1, word2; word1 = "Hello"; word2 = word1; word1 = "Java"; Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 536 Chapter 9 9. Characters and Strings © The McGraw−Hill Companies, 2005 Text Characters and Strings 6. Using a state-of-memory diagram, illustrate the difference between a null string and an empty string—a string that has no characters in it. Show the state-of-memory diagram for the following code. Variable word1 is a null string, while word2 is an empty string. String word1, word2; word1 = null; word2 = ""; 7. Draw a state-of-memory diagram for each of the following groups of statements. String word1, word2; String word1, word2; word1 = "French Roast"; word2 = word1; word1 = "French Roast"; word2 = "French Roast"; 8. Write a GUI application that reads in a character and displays the character’s ASCII. The getText method of the JTextField class returns a String object, so you need to extract a char value, as in String inputString = inputField.getText(); char character = inputString.charAt(0); Display an error message if more than one character is entered. 9. Write a method that returns the number of uppercase letters in a String object passed to the method as an argument. Use the class method isUpperCase of the Character class, which returns true if the passed parameter of type char is an uppercase letter. You need to explore the Character class from the java.lang package on your own. 10. Redo Exercise 9 without using the Character class. Hint: The ASCII of any uppercase letter will fall between 65 (code for ‘A’) and 90 (code for ‘Z’). 11. Write a program that reads a sentence and prints out the sentence with all uppercase letters changed to lowercase and all lowercase letters changed to uppercase. 12. Write a program that reads a sentence and prints out the sentence in reverse order. For example, the method will display ?uoy era woH for the input How are you? 13. Write a method that transposes words in a given sentence. For example, given an input sentence The gate to Java nirvana is near the method outputs ehT etag ot avaJ anavrin si raen Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 9. Characters and Strings © The McGraw−Hill Companies, 2005 Text Exercises 537 To simplify the problem, you may assume the input sentence contains no punctuation marks. You may also assume that the input sentence starts with a nonblank character and that there is exactly one blank space between the words. 14. Improve the method in Exercise 13 by removing the assumptions. For example, an input sentence could be Hello, how are you? I use JDK 1.2.2. Bye-bye. An input sentence may contain punctuation marks and more than one blank space between two words. Transposing the above will result in olleH, woh era uoy? I esu KDJ 1.2.2. eyB-eyb. Notice the position of punctuation marks does not change and only one blank space is inserted between the transposed words. 15. The Ch9CountWords program that counts the number of words in a given sentence has a bug. If the input sentence has one or more blank spaces at the end, the value for wordCount will be 1 more than the actual number of words in the sentence. Correct this bug in two ways: one with the trim method of the String class and another without using this method. 16. The Ch9ExtractWords program for extracting words in a given sentence includes the test if (beginIdx != endIdx) ... Describe the type of input sentences that will result in the variables beginIdx and endIdx becoming equal. 17. Write an application that reads in a sentence and displays the count of individual vowels in the sentence. Use any output routine of your choice to display the result in this format. Count only the lowercase vowels. Vowel counts for the sentence Mary had a little lamb. # # # # # of of of of of 'a' 'e' 'i' 'o' 'u' : : : : : 4 1 1 0 0 18. Write an application that determines if an input word is a palindrome. A palindrome is a string that reads the same forward and backward, for example, noon and madam. Ignore the case of the letter. So, for example, maDaM, MadAm, and mAdaM are all palindromes. Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 538 Chapter 9 9. Characters and Strings Text © The McGraw−Hill Companies, 2005 Characters and Strings 19. Write an application that determines if an input sentence is a palindrome, for example, A man, a plan, a canal, Panama! You ignore the punctuation marks, blanks, and case of the letters. Development Exercises For the following exercises, use the incremental development methodology to implement the program. For each exercise, identify the program tasks, create a design document with class descriptions, and draw the program diagram. Map out the development steps at the start. Present any design alternatives and justify your selection. Be sure to perform adequate testing at the end of each development step. 20. Write an Eggy-Peggy program. Given a string, convert it to a new string by placing egg in front of every vowel. For example, the string I Love Java becomes eggI Leegoveege Jeegaveega 21. Write a variation of the Eggy-Peggy program. Implement the following four variations: • Sha • Na • Sha Na Na • Ava Add sha to the beginning of every word. Add na to the end of every word. Add sha to the beginning and na na to the end of every word. Move the first letter to the end of the word and add ava to it. Allow the user to select one of four possible variations. Use JOptionPane for input. 22. Write a word guessing game. The game is played by two players, each taking a turn in guessing the secret word entered by the other player. Ask the first player to enter a secret word. After a secret word is entered, display a hint that consists of a row of dashes, one for each letter in the secret word. Then ask the second player to guess a letter in the secret word. If the letter is in the secret word, replace the dashes in the hint with the letter at all positions where this letter occurs in the word. If the letter does not appear in the word, the number of incorrect guesses is incremented by 1. The second player keeps guessing letters until either • The player guesses all the letters in the word. or • The player makes 10 incorrect guesses. Here’s a sample interaction with blue indicating the letter entered by the player: - - - S - - - - Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 9. Characters and Strings Text © The McGraw−Hill Companies, 2005 Exercises 539 A - A - A V - A V A D - A V A J J A V A Bingo! You won. Support the following features: • Accept an input in either lowercase or uppercase. • If the player enters something other than a single letter (a digit, special character, multiple letters, etc.), display an error message. The number of incorrect guesses is not incremented. • If the player enters the same correct letter more than once, reply with the previous hint. • Entering an incorrect letter the second time is counted as another wrong guess. For example, suppose the letter W is not in the secret word. Every time the player enters W as a guess, the number of incorrect guesses is incremented by 1. After a game is over, switch the role of players and continue with another game. When it is the first player’s turn to enter a secret word, give an option to the players to stop playing. Keep the tally and announce the winner at the end of the program. The tally will include for each player the number of wins and the total number of incorrect guesses made for all games. The player with more wins is the winner. In the case where both players have the same number of wins, the one with the lower number of total incorrect guesses is the winner. If the total numbers of incorrect guesses for both players are the same also, then it is a draw. 23. Write another word guessing game similar to the one described in Exercise 22. For this word game, instead of using a row of dashes for a secret word, a hint is provided by displaying the letters in the secret word in random order. For example, if the secret word is COMPUTER, then a possible hint is MPTUREOC. The player has only one chance to enter a guess. The player wins if he guessed the word correctly. Time how long the player took to guess the secret word. After a guess is entered, display whether the guess is correct or not. If correct, display the amount of time in minutes and seconds used by the player. The tally will include for each player the number of wins and the total amount of time taken for guessing the secret words correctly (amount of time used for incorrect guesses is not tallied). The player with more wins is the winner. In the case where both players have the same number of wins, the one who used the lesser amount of time for correct guesses is the winner. If the total time used by both players is the same also, then it is a draw. Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 540 Chapter 9 9. Characters and Strings © The McGraw−Hill Companies, 2005 Text Characters and Strings 24. The word game Eggy-Peggy is an example of encryption. Encryption has been used since ancient times to communicate messages secretly. One of the many techniques used for encryption is called a Caesar cipher. With this technique, each character in the original message is shifted N positions. For example, if N ⫽ 1, then the message I d r i n k o n l y d e c a f becomes J ! e s j o l ! p o m z ! e f d b g The encrypted message is decrypted to the original message by shifting back every character N positions. Shifting N positions forward and backward is achieved by converting the character to ASCII and adding or subtracting N. Write an application that reads in the original text and the value for N and displays the encrypted text. Make sure the ASCII value resulting from encryption falls between 32 and 126. For example, if you add 8 (value of N ) to 122 (ASCII code for ‘z’ ), you should “wrap around” and get 35. Write another application that reads the encrypted text and the value for N and displays the original text by using the Caesar cipher technique. Design a suitable user interface. 25. Another encryption technique is called a Vignere cipher. This technique is similar to a Caesar cipher in that a key is applied cyclically to the original message. For this exercise a key is composed of uppercase letters only. Encryption is done by adding the code values of the key’s characters to the code values of the characters in the original message. Code values for the key characters are assigned as follows: 0 for A, 1 for B, 2 for C, . . . , and 25 for Z. Let’s say the key is COFFEE and the original message is I drink only decaf. Encryption works as follows: I | + | C | + | O d | + | F r i n k o n l y d e c a | + . . . | F E E C O F F E E C O F F E K – i W . . . f | + | E j Decryption reverses the process to generate the original message. Write an application that reads in a text and displays the encrypted text. Make sure the ASCII value resulting from encryption or decryption falls between 32 and 126. You can get the code for key characters by (int) keyChar - 65. Wu (Otani): Introduction to Object−Oriented Programming with Java, 4th Edition 9. Characters and Strings © The McGraw−Hill Companies, 2005 Text Exercises 541 Write another application that reads the encrypted text and displays the original text, using the Vignere cipher technique. 26. A public-key cryptography allows anyone to encode messages while only people with a secret key can decipher them. In 1977, Ronald Rivest, Adi Shamir, and Leonard Adleman developed a form of public-key cryptography called the RSA system. To encode a message using the RSA system, one needs n and e. The value n is a product of any two prime numbers p and q. The value e is any number less than n that cannot be evenly divided into y (that is, y ⫼ e would have a remainder), where y ⫽ (p ⫺ 1) ⫻ (q ⫺ 1). The values n and e can be published in a newspaper or posted on the Internet, so anybody can encrypt messages. The original character is encoded to a numerical value c by using the formula c ⫽ me mod n where m is a numerical representation of the original character (for example, 1 for A, 2 for B, and so forth). Now, to decode a message, one needs d. The value d is a number that satisfies the formula e ⭈ d mod y ⫽ 1 where e and y are the values defined in the encoding step. The original character m can be derived from the encrypted character c by using the formula m ⫽ cd mod n Write a program that encodes and decodes messages using the RSA system. Use large prime numbers for p and q in computing the value for n, because when p and q are small, it is not that difficult to find the value of d. When p and q are very large, however, it becomes practically impossible to determine the value of d. Use the ASCII values as appropriate for the numerical representation of characters. Visit http://www.rsasecurity.com for more information on how the RSA system is applied in the real world.