Download 9Characters and Strings

Document related concepts
no text concepts found
Transcript
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
9. Characters and Strings
9
© The McGraw−Hill
Companies, 2005
Text
Characters and Strings
O b j e c t i v e s
After you have read and studied this chapter, you should be able to
• Declare and manipulate data of the char type.
string processing programs, using
• Write
String, StringBuilder, and StringBuffer
objects.
regular expressions for searching a
• Specify
pattern in a string.
the String, StringBuilder, and
• Differentiate
StringBuffer classes and use the correct class
in solving a given task.
the difference between equality and
• Tell
equivalence testings for String objects.
• Use the Pattern and Matcher classes.
487
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
488
Chapter 9
9. Characters and Strings
Text
© The McGraw−Hill
Companies, 2005
Characters and Strings
I n t r o d u c t i o n
E
arly computers in the 1940s and 1950s were more like gigantic calculators because
they were used primarily for numerical computation. However, as computers have
evolved to possess more computational power, our use of computers is no longer
limited to numerical computation. Today we use computers for processing information of diverse types. In fact, most application software today such as Web
browsers, word processors, database management systems, presentation software,
and graphics design software is not intended specifically for number crunching.
These programs still perform numerical computation, but their primary data are
text, graphics, video, and other nonnumerical data. We have already seen examples
of nonnumerical data processing. We introduced the String class and string processing in Chapter 2. A nonnumerical data type called boolean was used in Chapters 5
and 6. In this chapter, we will delve more deeply into the String class and present
advanced string processing. We will also introduce the char data type for representing a single character and the StringBuffer class for an efficient operation on a
certain type of string processing.
9.1 Characters
char
ASCII
In Java single characters are represented by using the data type char. Character
constants are written as symbols enclosed in single quotes, for example, ‘a’, ‘X’,
and ‘5’. Just as we use different formats to represent integers and real numbers
using 0s and 1s in computer memory, we use special codes of 0s and 1s to represent single characters. For example, we may assign 1 to represent ‘A’ and 2 to represent ‘B’. We can assign codes similarly to lowercase letters, punctuation marks,
digits, and other special symbols. In the early days of computing, different computers used not only different coding schemes but also different character sets. For
example, one computer could represent the symbol 1⁄4, while other computers
could not. Individualized coding schemes did not allow computers to share information. Documents created by using one scheme are complete gibberish if we try
to read these documents by using another scheme. To avoid this problem, U.S.
computer manufacturers devised several coding schemes. One of the coding
schemes widely used today is ASCII (American Standard Code for Information
Interchange). We pronounce ASCII “ăs kē.” Table 9.1 shows the 128 standard
ASCII codes.
Adding the row and column indexes gives you the ASCII code for a given
character. For example, the value 87 is the ASCII code for the character ‘W’. Not all
characters in the table are printable. ASCII codes 0 through 31 and 127 are nonprintable control characters. For example, ASCII code 7 is the bell (the computer
beeps when you send this character to output), and code 9 is the tab.
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
9. Characters and Strings
© The McGraw−Hill
Companies, 2005
Text
9.1
489
Characters
When we use a word processor to create a document, the file that contains the
document includes not only the contents but also the formatting information.
Since each software company uses its own coding scheme for storing this information, we have to use the same word processor to open the document. Often it
is even worse. We cannot open a document created by a newer version of the
same word processor with an older version. If we just want to exchange the text of
a document, then we can convert it to ASCII format. Any word processor can open
and save ASCII files. If we would like to retain the formatting information also, we
can convert the document, using software such as Adobe Acrobat. This software
converts a document (including text, formatting, images, etc.) created by different
word processors to a format called PDF. Anybody with a free Acrobat Reader can
open a PDF file. Many of the documents available from our website are in this PDF
format.
To represent all 128 ASCII codes, we need 7 bits ranging from 000 0000 (0)
to 111 1111 (127). Although 7 bits is enough, ASCII codes occupy 1 byte (8 bits) because the byte is the smallest unit of memory you can access. Computer manufacturers use the extra bit for other nonstandard symbols (e.g., lines and boxes). Using
8 bits, we can represent 256 symbols in total—128 standard ASCII codes and 128
nonstandard symbols.
Table
Table 9.1
0
10
20
30
40
50
60
70
80
90
100
110
120
ASCII codes.
0
1
2
3
4
5
6
7
8
9
nul
lf
cd4
rs
(
2
<
F
P
Z
d
n
x
soh
vt
nak
us
)
3
=
G
Q
[
e
o
y
stx
ff
syn
sp
*
4
>
H
R
\
f
p
z
etx
cr
etb
!
+
5
?
I
S
]
g
q
{
eot
so
can
"
,
6
@
J
T
^
h
r
|
enq
si
em
#
7
A
K
U
_
i
s
}
ack
dle
sub
$
.
8
B
L
V
`
j
t
~
bel
dc1
esc
%
/
9
C
M
W
a
k
u
del
bs
dc2
fs
&
0
:
D
N
X
b
l
v
ht
dc3
gs
'
1
;
E
O
Y
c
m
w
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
490
Chapter 9
Unicode
9. Characters and Strings
Text
© The McGraw−Hill
Companies, 2005
Characters and Strings
The standard ASCII codes work just fine as long as we are dealing with the
English language because all letters and punctuation marks used in English are
included in the ASCII codes. We cannot say the same for other languages. For languages such as French and German, the additional 128 codes may be used to represent character symbols not available in standard ASCII. But what about different
currency symbols? What about non-European languages? Chinese, Japanese, and
Korean all use different coding schemes to represent their character sets. Eight bits
is not enough to represent thousands of ideographs. If we try to read Japanese characters by using ASCII, we will see only meaningless symbols.
To accommodate the character symbols of non-English languages, the
Unicode Consortium established the Unicode Worldwide Character Standard,
commonly known simply as Unicode, to support the interchange, processing, and
display of the written texts of diverse languages. The standard currently contains
34,168 distinct characters, which cover the major languages of the Americas,
Europe, the Middle East, Africa, India, Asia, and Pacifica. To accommodate such a
large number of distinct character symbols, Unicode characters occupy 2 bytes.
Unicode codes for the character set shown in Table 9.1 are the same as ASCII
codes.
Java, being a language for the Internet, uses the Unicode standard for representing char constants. Although Java uses the Unicode standard internally to store
characters, to use foreign characters for input and output in our programs, the operating system and the development tool we use for Java programs must be capable of
handling the foreign characters.
Characters are declared and used in a manner similar to data of other types.
The declaration
char ch1, ch2 = 'X';
declares two char variables ch1 and ch2 with ch2 initialized to ‘X’. We can display
the ASCII code of a character by converting it to an integer. For example, we can
execute
JOptionPane.showMessageDialog("ASCII code of character X is "
+ (int)'X' );
Conversely, we can see a character by converting its ASCII code to the char data
type, for example,
JOptionPane.showMessageDialog(
"Character with ASCII code 88 is " + (char)88 );
Because the characters have numerical ASCII values, we can compare characters just as we compare integers and real numbers. For example, the comparison
'A' < 'c'
returns true because the ASCII value of ‘A’ is 65 while that of ‘c’ is 99.
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
9. Characters and Strings
Text
© The McGraw−Hill
Companies, 2005
9.2
Strings
491
1. Determine the output of the following statements.
a.
b.
c.
d.
System.out.println( (char) 65 );
System.out.println( (int) 'C' );
System.out.println( 'Y' );
if ( 'A' < '?' )
System.out.println( 'A' );
else
System.out.println( '?' );
2. How many distinct characters can you represent by using 8 bits?
9.2 Strings
String
A string is a sequence of characters that is treated as a single value. Instances of the
String class are used to represent strings in Java. Rudimentary string processing
was already presented in Chapter 2, using methods such as substring, length, and
indexOf. In this section we will learn more advanced string processing, using other
methods of the String class.
To introduce additional methods of the String class, we will go through a number of common string processing routines. The first is to process a string looking for
a certain character or characters. Let’s say we want to input a person’s name and determine the number of vowels that the name contains. The basic idea is very simple:
for each character ch in the string {
if (ch is a vowel) {
increment the counter
}
}
charAt
There are two details we need to know before being able to translate that into actual
code. First, we need to know how to refer to an individual character in the string.
Second, we need to know how to determine the size of the string, that is, the number of characters the string contains, so we can write the boolean expression to stop
the loop correctly. We know from Chapter 2 that the second task is done by using
the length method. For the first task, we use charAt.
We access individual characters of a string by calling the charAt method of
the String object. For example, to display the individual characters of the string
Sumatra one at a time, we can write
String name = "Sumatra";
int
size = name.length();
for (int i = 0; i < size; i++) {
System.out.println(name.charAt(i));
}
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
492
Chapter 9
9. Characters and Strings
© The McGraw−Hill
Companies, 2005
Text
Characters and Strings
String name = "Sumatra";
0
1
2
3
4
5
6
S
u
m
a
t
r
a
name
name.charAt(3)
The variable refers to the
whole string.
The method returns the
character at position 3.
Figure 9.1 An indexed expression is used to refer to individual characters in a string.
Each character in a string has an index that we use to access the character. We
use zero-based indexing; that is, the first character has index 0, the second character has index 1, the third character has index 2, and so forth. To refer to the first character of name, for example, we say
name.charAt(0)
Since the characters are indexed from 0 to size-1, we could express the
preceding for loop as
for (int i = 0; i <= size - 1; i++)
However, we will use the first style almost exclusively to be consistent.
Figure 9.1 illustrates how the charAt method works. Notice that name refers
to a String object, and we are calling its charAt method that returns a value of primitive data type char. Strictly speaking, we must say “name is a variable of type
String whose value is a reference to an instance of String.” However, when the value
of a variable X is a reference to an instance of class Y, we usually say “X is an instance of Y” or “X is a Y object.”
If the value of a variable X is a reference to an object of class Y, then we say “ X is a Y
object” or “X is an instance of Y.”
Since String is a class, we can create an instance of a class by using the new
method. The statements we have been using so far, such as
String name1 = "Kona";
String name2;
name2 = "Espresso";
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
9. Characters and Strings
© The McGraw−Hill
Companies, 2005
Text
9.2
Strings
493
work as a shorthand for
String name1 = new String("Kona");
String name2;
name2 = new String("Espresso");
Be aware that this shorthand works for the String class only. Moreover, although the
difference will not be critical in almost all situations, they are not exactly the same.
We will discuss the subtle difference between the two in Section 9.5.
Here is the code for counting the number of vowels:
/*
Chapter 9 Sample Program: Count the number of vowels
in a given string
File: Ch9CountVowels.java
*/
import javax.swing.*;
class Ch9CountVowels {
public static void main (String[] args) {
String
name;
int
numberOfCharacters,
vowelCount = 0;
char
letter;
name = JOptionPane.showInputDialog(null, "What is your name?");
numberOfCharacters = name.length();
for (int i = 0; i < numberOfCharacters; i++) {
letter = name.charAt(i);
if (
letter
letter
letter
letter
letter
==
==
==
==
==
'a'
'e'
'i'
'o'
'u'
||
||
||
||
||
letter
letter
letter
letter
letter
==
==
==
==
==
'A'
'E'
'I'
'O'
'U'
||
||
||
||
) {
vowelCount++;
}
}
JOptionPane.showMessageDialog(null, name + ", your name has " +
vowelCount + " vowels");
}
}
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
494
Chapter 9
9. Characters and Strings
© The McGraw−Hill
Companies, 2005
Text
Characters and Strings
We can shorten the boolean expression in the if statement by using the
toUpperCase method of the String class. This method converts every character in a
string to uppercase. Here’s the rewritten code:
/*
Chapter 9 Sample Program: Count the number of vowels
in a given string using toUpperCase
File: Ch9CountVowels2.java
*/
import javax.swing.*;
class Ch9CountVowels2 {
public static void main (String[] args) {
String
name, nameUpper;
int
numberOfCharacters,
vowelCount = 0;
char
letter;
name = JOptionPane.showInputDialog(null, "What is your name?");
numberOfCharacters = name.length();
nameUpper = name.toUpperCase();
for (int i = 0; i < numberOfCharacters; i++) {
letter = nameUpper.charAt(i);
if ( letter
letter
letter
letter
letter
==
==
==
==
==
'A'
'E'
'I'
'O'
'U'
||
||
||
||
) {
vowelCount++;
}
}
JOptionPane.showMessageDialog(null, name + ", your name has " +
vowelCount + " vowels");
}
}
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
9. Characters and Strings
Text
© The McGraw−Hill
Companies, 2005
9.2
toUpperCase
Strings
495
Notice that the original string name is unchanged. A new, converted string is
returned from the toUpperCase method and assigned to the second String variable
nameUpper.
Let’s try another example. This time we read in a string and count how many
words the string contains. For this example we consider a word as a sequence of characters separated, or delimited, by blank spaces. We treat punctuation marks and other
symbols as part of a word. Expressing the task in pseudocode, we have the following:
read in a sentence;
while (there are more characters in the sentence) {
look for the beginning of the next word;
now look for the end of this word;
increment the word counter;
}
We use a while loop here instead of do–while to handle the case when the
input sentence contains no characters, that is, when it is an empty string. Let’s implement the routine. Here’s our first attempt:
//Attempt No. 1
n
o
i
s
r
e
V
d
Ba
static final char BLANK = ' ';
int index, wordCount, numberOfCharacters;
String sentence = JOptionPane.showInputDialog(null,
"Enter a sentence:");
numberOfCharacters = sentence.length();
index
= 0;
wordCount = 0;
while (index < numberOfCharacters ) {
//ignore blank spaces
while (sentence.charAt(index) == BLANK) {
index++;
}
Skip blank spaces until
a character that is not a
blank space is encountered.This is the beginning of a word.
Once the beginning of
a word is detected, we
skip nonblank characters until a blank space
is encountered.This is
the end of the word.
n
o
i
s
r
e
V
d
Ba
//now locate the end of the word
while (sentence.charAt(index) != BLANK) {
index++;
}
//another word has been found, so increment the counter
wordCount++;
}
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
496
Chapter 9
9. Characters and Strings
Text
© The McGraw−Hill
Companies, 2005
Characters and Strings
This implementation has a problem. The counter variable index is incremented inside the two inner while loops, and this index could become equal to
numberOfCharacters, which is an error, because the position of the last character is
numberOfCharacters – 1. We need to modify the two while loops so that index will
not become larger than numberOfCharacters –1. Here’s the modified code:
/*
Chapter 9 Sample Program: Count the number of words
in a given string
File: Ch9CountWords.java (Attempt 2)
*/
import javax.swing.*;
class Ch9CountWords { //Attempt 2
private static final char BLANK = ' ';
public static void main (String[] args) {
int
index, wordCount, numberOfCharacters;
String
sentence = JOptionPane.showInputDialog(null,
"Enter a sentence:");
numberOfCharacters
index
wordCount
= sentence.length( );
= 0;
= 0;
while ( index < numberOfCharacters ) {
//ignore blank spaces
while (index < numberOfCharacters &&
sentence.charAt(index) == BLANK) {
index++;
}
//now locate the end of the word
while (index < numberOfCharacters &&
sentence.charAt(index) != BLANK) {
index++;
}
//another word is found, so increment the counter
wordCount++;
}
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
9. Characters and Strings
Text
© The McGraw−Hill
Companies, 2005
9.2
Strings
497
//display the result
System.out.println( "Input sentence: " + sentence );
System.out.println("\n");
System.out.println( "
Word count: " + wordCount + " words" );
}
}
Notice that the order of comparisons in the boolean expression
index < numberOfCharacters
&& sentence.charAt(index) == BLANK
is critical. If we switch the order to
sentence.charAt(index) == BLANK
&& index < numberOfCharacters
out-of-bound
exception
and if the last character in the string is a space, then an out-of-bound exception will
occur because the value of index is a position that does not exist in the string sentence.
By putting the expression correctly as
index < numberOfCharacters && sentence.charAt(index) != ' '
we will not get an out-of-bound exception because the boolean operator && is a shortcircuit operator. If the relation index < numberOfCharacters is false, then the second
half of the expression sentence.charAT(index) != BLANK will not get evaluated.
There is still a problem with the attempt 2 code. If the sentence ends with one
or more blank spaces, then the value for wordCount will be 1 more than the actual
number of words in the sentence. It is left as an exercise to correct this bug (see
Exercise 15 at the end of the chapter).
Our third example counts the number of times the word Java occurs in the
input. The repetition stops when the word STOP is read. Lowercase and uppercase letters are not distinguished when an input word is compared to Java, but the word STOP
for terminating the loop must be in all uppercase letters. Here’s the pseudocode:
javaCount = 0;
while (true) {
read in next word;
if (word is "STOP") {
break;
} else if (word is "Java" ignoring cases) {
javaCount++;
}
}
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
498
Chapter 9
9. Characters and Strings
© The McGraw−Hill
Companies, 2005
Text
Characters and Strings
And here’s the actual code. Pay close attention to how the strings are
compared.
/*
Chapter 9 Sample Program:
Count the number of times the word 'java' occurs
in input. Case-insensitive comparison is used here.
The program terminates when the word STOP (case-sensitive)
is entered.
File: Ch9CountJava.java
*/
import javax.swing.*;
class Ch9CountJava {
public static void main (String[] args) {
int
javaCount
String
word;
= 0;
while (true) {
word = JOptionPane.showInputDialog(null, "Next word:");
if ( word.equals("STOP") )
break;
{
} else if ( word.equalsIgnoreCase("Java") ) {
javaCount++;
}
}
System.out.println("'Java' count: " + javaCount );
}
}
compareTo
String comparison is done by two methods—equals and equalsIgnoreCase—
whose meanings should be clear from the example. Another comparison method is
compareTo. This method compares two String objects str1 and str2 as in
str1.compareTo( str2 );
and returns 0 if they are equal, a negative integer if str1 is less than str2, and a positive integer if str1 is greater than str2. The comparison is based on the lexicographic
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
9. Characters and Strings
Text
© The McGraw−Hill
Companies, 2005
9.2
Strings
499
order of Unicode. For example, caffeine is less than latte. Also, the string jaVa is
less than the string java because the Unicode value of V is smaller than the Unicode
value of v. (See the ASCII table, Table 9.1.)
Some of you may be wondering why we don’t say
if ( word == "STOP" )
We can, in fact, use the equality comparison symbol == to compare two String objects, but the result is different from the result of the method equals. We will explain
the difference in Section 9.5.
Let’s try another example, using the substring method we introduced in Chapter 2. To refresh our memory, here’s how the method works. If str is a String object,
then the expression
str.substring ( beginIndex, endIndex )
returns a new string that is a substring of str from position beginIndex to endIndex – 1.
The value of beginIndex must be between 0 and str.length() – 1, and the value of
endIndex must be between 0 and str.length(). In addition, the value of beginIndex
must be less than or equal to the value of endIndex. Passing invalid values for
beginIndex or endIndex will result in a runtime error.
The following code creates a new string Javanist from Alpinist by using the
substring method.
String oldWord = "Alpinist";
String newWord = "Java" + oldWord.substring(4,8);
In this example, we print out the words from a given sentence, using one line
per word. For example, given an input sentence
I want to be a Java programmer
the code will print out
I
want
to
be
a
Java
programmer
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
500
Chapter 9
9. Characters and Strings
Text
© The McGraw−Hill
Companies, 2005
Characters and Strings
This sample code is similar to the previous one that counts the number of
words in a given sentence. Instead of just counting the words, we need to extract the
word from the sentence and print it out. Here’s how we write the code:
/*
Chapter 9 Sample Program:
Extract the words in a given sentence and
print them, using one line per word.
File: Ch9ExtractWords.java
*/
import javax.swing.*;
class Ch9ExtractWords {
private static final char BLANK = ' ';
public static void main (String[] args) {
int
index,
numberOfCharacters,
beginIdx, endIdx;
String
word,
sentence = JOptionPane.showInputDialog(null, "Input:");
numberOfCharacters = sentence.length();
index = 0;
while ( index < numberOfCharacters ) {
//ignore leading blank spaces
while (index < numberOfCharacters &&
sentence.charAt(index) == BLANK) {
index++;
}
beginIdx = index;
//now locate the end of the word
while (index < numberOfCharacters &&
sentence.charAt(index) != BLANK) {
index++;
}
endIdx = index;
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
9. Characters and Strings
Text
© The McGraw−Hill
Companies, 2005
9.2
Strings
501
if (beginIdx != endIdx) {
//another word is found, extract it from the
//sentence and print it out
word = sentence.substring( beginIdx, endIdx );
System.out.println(word);
}
}
}
}
Notice the signficance of the test
if (beginIdx != endIdx)
in the code. For what kinds of input sentences will the variables beginIdx and
endIdx be equal? We’ll leave this as an exercise (see Exercise 16 at the end of the
chapter).
1. Determine the output of the following code.
a. String str = "Programming";
for (int i = 0; i < 9; i+=2) {
System.out.print( str.charAt( i ) );
}
b. String str = "World Wide Web";
for (int i = 0; i < 10; i ++ ) }
if ( str.charAt(i) == 'W') {
System.out.println( 'M' );
} else {
System.out.print( str.charAt(i) );
}
}
2. Write a loop that prints out a string in reverse. If the string is Hello, then the
code outputs olleH. Use System.out.
3. Assume two String objects str1 and str2 are initialized as follows:
String str1 = "programming";
String str2 = "language";
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
502
Chapter 9
9. Characters and Strings
Text
© The McGraw−Hill
Companies, 2005
Characters and Strings
Determine the value of each of the following expressions if they are valid. If
they are not valid, state the reason why.
a.
b.
c.
d.
e.
f.
str1.compareTo( str2 )
str2.compareTo( str2 )
str2.substring( 1, 1 )
str2.substring( 0, 7 )
str2.charAt( 11 )
str1.length( ) + str2.length( )
4. What is the difference between the two String methods equals and
equalsIgnoreCase?
9.3 Pattern Matching and Regular Expression
pattern
matching
One sample code from Section 9.2 searched for the word Java in a given string. This
sample code illustrated a very simplified version of a well-known problem called
pattern matching. Word processor features such as finding a text and replacing a text
with another text are two specialized cases of a pattern-matching problem. Because
pattern matching is so common in many applications, from Java 2 SDK 1.4, two
new classes—Pattern and Matcher—are added. The String class is also modifed to
include several new methods that support pattern matching.
The matches Method
Let’s begin with the matches method from the String class. In its simplest form, it
looks very similar to the equals method. For example, given a string str, the two
statements
str.equals("Hello");
str.matches("Hello");
both evaluate to true if str is the string Hello. However, they are not truly equivalent,
because, unlike equals, the argument to the matches method can be a pattern, a feature that brings great flexibility and power to the matches method.
Suppose we assign a three-digit code to all incoming students. The first digit
represents the major, and 5 stands for the computer science major. The second digit
represents the home state: 1 is for in-state students, 2 is for out-of-state students, and
3 is for foreign students. And the third digit represents the residence of the student.
On-campus dormitories are represented by digits from 1 through 7. Students living
off campus are represented by digit 8. For example, the valid encodings for students
majoring in computer science and living off campus are 518, 528, and 538. The
valid three-digit code for computer science majors living in one of the on-campus
dormitories can be expressed succinctly as
5[123][1-7]
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
9. Characters and Strings
© The McGraw−Hill
Companies, 2005
Text
9.3
Pattern Matching and Regular Expression
503
and here’s how we interpret the pattern:
first
digit
second
digit
third
digit
5 [123] [1–7]
It must be 5 for
the computer
science majors.
It must be any
digit from 1 to 7.
It must be
1, 2, or 3.
regular
expression
The pattern is called a regular expression that allows us to denote a large (often infinite) set of words succinctly. The “word” is composed of any sequence of symbols
and is not limited to alphabets. The brackets [ ] are used here to represent choices,
so [123] means 1, 2, or 3. We can use the notation for alphabets also. For example,
[aBc] means a, B, or c. Notice the notation is case-sensitive. The hyphen in the brackets shows the range, so [1-7] means any digit from 1 to 7. If we want to allow any
lowercase letter, then the regular expression will be [a-z]. The hat symbol ^ is used
for negation. For example, [^abc] means any character except a, b, or c. Notice that
this expression does not restrict the character to lowercase letters; it can be any
character including digits and symbols. To refer to all lowercase letters except a, b,
or c, the correct expression is [a-z&&[^abc]]. The double ampersand represents an
intersection. Here are more examples:
Expression
Description
[013]
[0-9][0-9]
A[0-4]b[05]
A single digit 0, 1, or 3.
Any two-digit number from 00 to 99.
A string that consists of four characters.The first
character is A. The second character is 0, 1, 2, 3, or 4.
The third character is b. And the last character is
either 0 or 5.
A single digit that is 0, 1, 2, 3, 8, or 9.
A single character that is either a lowercase letter or a
digit.
[0-9&&[^4567]]
[a-z0-9]
We can use repetition symbols * or + to designate a sequence of unbounded
length. The symbol * means 0 or more times, and the symbol + means 1 or more times.
Let’s try an example using a repetition symbol. Remember the definition for a valid
Java identifier? We define it as a seqence of alphanumeric characters, underscores,
and dollar signs, with the first character being an alphabet. In regular expression, we
can state this definition as
[a-zA-Z][a-zA-Z0-9_$]*
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
504
Chapter 9
9. Characters and Strings
Text
© The McGraw−Hill
Companies, 2005
Characters and Strings
Let’s write a short program that will input a word and determine whether it is
a valid Java identifier. The program stops when the word entered is STOP. Here’s the
program:
/*
Chapter 9 Sample Program: Checks whether the input
string is a valid identifier.
File: Ch9MatchJavaIdentifier.java
*/
import javax.swing.*;
class Ch9MatchJavaIdentifier {
private static final String STOP
= "STOP";
private static final String VALID
= "Valid Java identifier";
private static final String INVALID = "Not a valid Java identifier";
private static final String VALID_IDENTIFIER_PATTERN
= "[a-zA-Z][a-zA-Z0-9_$]*";
public static void main (String[] args) {
String str, reply;
while (true) {
str = JOptionPane.showInputDialog(null, "Identifier:");
if (str.equals(STOP)) break;
if (str.matches(VALID_IDENTIFIER_PATTERN)) {
reply = VALID;
} else {
reply = INVALID;
}
JOptionPane.showMessageDialog(null,
str + ":\n" + reply);
}
}
}
It is also possible to designate a sequence of fixed length. For example, to specify four-digit numbers, we write [0-9]{4}. The number in the braces { and } denotes
the number of repetitions. We can specify the minimum and maximum numbers of
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
9. Characters and Strings
© The McGraw−Hill
Companies, 2005
Text
9.3
Pattern Matching and Regular Expression
505
repetitions also. Here are the rules:
Expression
Description
X{N}
Repeat X exactly N times, where X is a regular
expression for a single character.
Repeat X at least N times.
Repeat X at least N but no more than M times.
X{N,}
X{N,M}
Here’s an example of using a sequence of fixed length. Suppose we want to
determine whether the input string represents a valid phone number that follows the
pattern of
xxx-xxx-xxxx
where x is a single digit from 0 through 9. The following is a program that inputs a
string continually and replies whether the input string conforms to the pattern. The
program terminates when a single digit 0 is entered. Structurally this program is
identical to the Ch9MatchJavaIdentifier class. Here’s the program:
/*
Chapter 9 Sample Program: Checks whether the input
string conforms to the phone number
pattern xxx-xxx-xxxx.
File: Ch9MatchPhoneNumber.java
*/
import javax.swing.*;
class Ch9MatchPhoneNumber {
private static final String STOP
= "0";
private static final String VALID
= "Valid phone number";
private static final String INVALID = "Not a valid phone number";
private static final String VALID_PHONE_PATTERN
= "[0-9]{3}-[0-9]{3}-[0-9]{4}";
public static void main (String[] args) {
String phoneStr, reply;
while (true) {
phoneStr = JOptionPane.showInputDialog(null, "Phone#:");
if (phoneStr.equals(STOP)) break;
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
506
Chapter 9
9. Characters and Strings
© The McGraw−Hill
Companies, 2005
Text
Characters and Strings
if (phoneStr.matches(VALID_PHONE_PATTERN)) {
reply = VALID;
} else {
reply = INVALID;
}
JOptionPane.showMessageDialog(null,
phoneStr + ":\n" + reply);
}
}
}
Suppose, with the proliferation of cell phones, the number of digits used for a
prefix increases from three to four in major cities. (In fact, Tokyo now uses a fourdigit prefix. Phenomenal growth in the use of fax machines in both offices and
homes caused the increase from three to four digits.) The valid format for phone
numbers then becomes
xxx-xxx-xxxx
or
xxx-xxxx-xxxx
This change can be handled effortlessly by defining VALID_PHONE_PATTERN as
private static final String VALID_PHONE_PATTERN
= "[0-9]{3}-[0-9]{3,4}-[0-9]{4}";
This is the power of regular expression and pattern-matching methods. All we
need to do is to make one simple adjustment to the regular expression. No other
changes are made to the program. Had we written the program without using the
pattern-matching technique (i.e., written the program using repetition control to test
the first to the last character individually), changing the code to handle both a threedigit and a four-digit prefix requires substantially greater effort.
The period symbol (.) is used to match any character except a line terminator
such as \n or \r. (By using the Pattern class, we can make it match a line terminator
also. We discuss more details on the Pattern class later.) We can use the period
symbol with the zero-or-more-times notation * to check if a given string contains a
sequence of characters we are looking for. For example, suppose a String object
document holds the content of some document, and we want to check if the phrase
“zen of objects” is in it. We can do it as follows:
String document;
document = ...; //assign text to 'document'
if (document.matches(".*zen of objects.*") {
System.out.println("Found");
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
9. Characters and Strings
© The McGraw−Hill
Companies, 2005
Text
9.3
Pattern Matching and Regular Expression
507
} else {
System.out.println("Not found");
}
The brackets [ and ] are used for expressing a range of choices for a single
character. If we need to express a range of choices for multiple characters, then we
use the parentheses and the vertical bar. For example, if we search for the word
maximum or minimum, we express the pattern as
(max|min)imum
Here are some more examples:
Expression
Description
[wb](ad|eed)
(pro|anti)-OOP
(AZ|CA|CO)[0-9]{4}
Matches wad, weed, bad, and beed.
Matches pro-OOP and anti-OOP.
Matches AZxxxx, CAxxxx, and COxxxx,
where x is a single digit.
The replaceAll Method
The second method new to the version 1.4 String class is the replaceAll method.
Using this method, we can replace all occurrences of a substring that matches a
given regular expression with a given replacement string. For example, here’s how
to replace all vowels in the string with the @ symbol:
String originalText, modifiedText;
originalText = ...; //assign string to 'originalText'
modifiedText = originalText.replaceAll("[aeiou]", "@");
Notice the original text is unchanged. The replaceAll method returns a modified text
as a separate string. Here are more examples:
Expression
Description
str.replaceAll("OOP",
"object-oriented programming")
str.replaceAll(
"[0-9]{3}-[0-9]{2}-[0-9]{4}",
"xxx-xx-xxxx")
str.replaceAll("o{2,}", "oo")
Replace all occurrences of OOP with
object-oriented programming.
Replace all social security numbers
with xxx-xx-xxxx.
Replace all occurrences of a sequence
that has two or more of letter o with oo.
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
508
Chapter 9
9. Characters and Strings
© The McGraw−Hill
Companies, 2005
Text
Characters and Strings
If we want to match only the whole word, we have to use the \b symbol to designate the word boundary. Suppose we write
str.replaceAll("temp", "temporary");
expecting to replace all occurrences of the abbreviated word temp by temporary.
We will get a surprising result. All occurrences of the sequence of characters temp
will be replaced; so, for example, words such as attempt or tempting would be replaced by attemporaryt or temporaryting, respectively. To designate the sequence
temp as a whole word, we place the word boundary symbol \b in the front and end
of the sequence.
str.replaceAll("\\btemp\\b", "temporary");
Notice the use of two backslashes. The symbol we use in the regular expression is \b. However, we must write this regular expression in a String representation.
And remember that the backslash symbol in a string represents a control character
such as \n, \t, and \r. To specify the regular expression symbol with a backslash, we
must use additional backslash, so the system will not interpret it as some kind of
control character. The regular expression we want here is
\btemp\b
To put it in a String representation, we write
"\\btemp\\b"
Here are the common backslash symbols used in regular expressions:
Expression
String
Representation
\d
\D
\s
"\\d"
"\\D"
"\\s"
\S
\w
"\\S"
"\\w"
\W
\b
"\\W"
"\\b"
\B
"\\B"
Description
A single digit. Equivalent to [0-9].
A single nondigit. Equivalent to [^0-9].
A white space character, such as space,
tab, new line, etc.
A non-white-space character.
A word character. Equivalent to
[a-zA-Z_0-9].
A nonword character.
A word boundary (such as a white space
and punctuation mark).
A nonword boundary.
We also use the backslash if we want to search for a command character. For
example, the plus symbol designates one or more repetitions. If we want to search
for the plus symbol in the text, we use the backslash as \+ and to express it as a
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
9. Characters and Strings
© The McGraw−Hill
Companies, 2005
Text
9.4
The Pattern and Matcher Classes
509
string, we write “\\+”. Here’s an example. To replace all occurrences of C and C++
(not necessarily a whole word) with Java, we write
str.replaceAll("(C|C\\+\\+)", "Java");
1. Describe the string that the following regular expressions match.
a. a*b
b. b[aiu]d
c. [Oo]bject(s| )
2. Write a regular expression for a state vehicle license number whose format is a
single capital letter, followed by three digits and four lowercase letters.
3. Which of the following regular expressions are invalid?
a.
b.
c.
d.
e.
(a-z)*+
[a|ab]xyz
abe-14
[a-z&&^a^b]
[//one]two
9.4 The Pattern and Matcher Classes
The matches and replaceAll methods of the String class are shorthand for using the
Pattern and Matcher classes from the java.util.regex package. We will describe how
to use these two classes for more efficient pattern matching.
The statement
str.matches(regex);
where str and regex are String objects is equivalent to
Pattern.matches(regex, str);
which in turn is equivalent to
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(str);
matcher.matches();
Similarly, the statement
str.replaceAll(regex, replacement);
where replacement is a replacement text is equivalent to
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(str);
matcher.replaceAll(replacement);
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
510
Chapter 9
9. Characters and Strings
Text
© The McGraw−Hill
Companies, 2005
Characters and Strings
Explicit creation of Pattern and Matcher objects gives us more options and
greater efficiency. We specify regular expressions as strings, but for the system to
actually carry out the pattern-matching operation, the stated regular expression must
first be converted to an internal format. This is done by the compile method of the
Pattern class. When we use the matches method of the String or Pattern class, this
conversion into the internal format is carried out every time the matches method is
executed. So if we use the same pattern multiple times, then it is more efficient to
convert just once, instead of repeating the same conversion, as was the case for
the Ch9MatchJavaIdentifier and Ch9MatchPhoneNumber classes. The following is
Ch9MatchJavaIdentifier2, a more efficient version of Ch9MatchJavaIdentifier:
/*
Chapter 9 Sample Program: Checks whether the input
string is a valid identifier. This version
uses the Matcher and Pattern classes.
File: Ch9MatchJavaIdentifier2.java
*/
import javax.swing.*;
import java.util.regex.*;
class Ch9MatchJavaIdentifier2 {
private static final String STOP
= "STOP";
private static final String VALID
= "Valid Java identifier";
private static final String INVALID = "Not a valid Java identifier";
private static final String VALID_IDENTIFIER_PATTERN
= "[a-zA-Z][a-zA-Z0-9_$]*";
public static void main (String[] args) {
String
Matcher
Pattern
str, reply;
matcher;
pattern
= Pattern.compile(VALID_IDENTIFIER_PATTERN);
while (true) {
str = JOptionPane.showInputDialog(null, "Identifier:");
if (str.equals(STOP)) break;
matcher = pattern.matcher(str);
if (matcher.matches()) {
reply = VALID;
} else {
reply = INVALID;
}
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
9. Characters and Strings
© The McGraw−Hill
Companies, 2005
Text
9.4
The Pattern and Matcher Classes
511
JOptionPane.showMessageDialog(null, str + ":\n" + reply);
}
}
}
We have a number of options when the Pattern compiles into an internal format.
For example, by default, the period symbol does not match the line terminator character. We can override this default by passing DOTALL as the second argument as
Pattern pattern = Pattern.compile(regex, Pattern.DOTALL);
To enable case-insensitive matching, we pass the CASE_INSENSITIVE constant.
The find method is another powerful method of the Matcher class. This method
searches for the next sequence in a string that matches the pattern. The method returns true if the patten is found. We can call the method repeatedly until it returns
false to find all matches. Here’s an example that counts the number of times the word
java occurs in a given document. We will search for the word in a case-insensitive
manner.
/*
Chapter 9 Sample Program:
Count the number of times the word 'java' occurs
in input using pattern-matching technique.
The program terminates when the word STOP (case-sensitive)
is entered.
File: Ch9PMCountJava.java
*/
import javax.swing.*;
import java.util.regex.*;
class Ch9PMCountJava {
public static void main (String[] args) {
String
int
document;
javaCount;
Matcher
Pattern
matcher;
pattern = Pattern.compile("java",
Pattern.CASE_INSENSITIVE);
document
= JOptionPane.showInputDialog(null, "Sentence:");
javaCount = 0;
matcher
= pattern.matcher(document);
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
512
Chapter 9
9. Characters and Strings
Text
© The McGraw−Hill
Companies, 2005
Characters and Strings
while (matcher.find()) {
javaCount++;
}
JOptionPane.showMessageDialog(null,
"The word 'java' occurred " +
javaCount + " times.");
}
}
When a matcher finds a matching sequence of characters, we can query the
location of the sequence by using the start and end methods. The start method returns the position in the string where the first character of the pattern is found, and
the end method returns the value 1 more than the position in the string where the
last character of the pattern is found. Here’s the code that prints out the matching
sequences and their locations in the string when searching for the word java in a
case-insensitive manner.
/*
Chapter 9 Sample Program:
Displays the positions the word 'java' occurs
in a given string using pattern-matching technique.
The program terminates when the word STOP (case-sensitive)
is entered.
File: Ch9PMCountJava2.java
*/
import javax.swing.*;
import java.util.regex.*;
class Ch9PMCountJava2 {
public static void main (String[] args) {
String
int
document;
javaCount;
Matcher
Pattern
matcher;
pattern = Pattern.compile("java",
Pattern.CASE_INSENSITIVE);
document
= JOptionPane.showInputDialog(null, "Sentence:");
javaCount = 0;
matcher
= pattern.matcher(document);
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
9. Characters and Strings
© The McGraw−Hill
Companies, 2005
Text
9.5
Comparing Strings
513
while (matcher.find()) {
System.out.println(document.substring(matcher.start(),
matcher.end())
+ " found at position "
+ matcher.start());
}
}
}
1. Replace the following statements with the equivalent ones using the Pattern
and Matcher classes.
a. str.replaceAll("1", "one");
b. str.matches("alpha");
2. Using the find method of the Matcher class, check if the given string
document contains the whole word Java.
9.5 Comparing Strings
We already discussed how objects are compared in Chapter 5. The same rule applies
for the string, but we have to be careful in certain situations because of the difference in the way a new String object is created. First, we will review how the objects
are compared. The difference between
String word1, word2;
...
if ( word1 == word2 ) ...
⫽⫽ versus
equals
and
if ( word1.equals(word2) ) ...
equivalence
test
is illustrated in Figure 9.2. The equality test == is true if the contents of variables
are the same. For a primitive data type, the contents are values themselves; but for
a reference data type, the contents are addresses. So for a reference data type, the
equality test is true if both variables refer to the same object, because they both
contain the same address. The equals method, on the other hand, is true if the String
objects to which the two variables refer contain the same string value. To distinguish the two types of comparisons, we will use the term equivalence test for the
equals method.
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
514
Chapter 9
9. Characters and Strings
© The McGraw−Hill
Companies, 2005
Text
Characters and Strings
Case A: Referring to the same object.
word1
word2
:String
Java
word1 == word2
is true
word1.equals( word2 )
is true
Note: If x ⴝⴝ y is true, then x.equals(y) is
also true. The reverse is not always true.
Case B: Referring to different objects having identical string values.
word1
word2
:String
:String
Java
Java
word1 == word2
is false
word1.equals( word2 )
is true
Case C: Referring to different objects having different string values.
word1
word2
:String
:String
Bali
Java
word1 == word2
is false
word1.equals( word2 )
is false
Figure 9.2 The difference between the equality test and the equals method.
As long as we create a new String object as
String str = new String("Java");
using the new operator, the rule for comparing objects applies to comparing strings.
However, when the new operator is not used, for example, in
String str = "Java";
we have to be careful. Figure 9.3 shows the difference in assigning a String object
to a variable. If we do not use the new operator, then string data are treated as if they
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
9. Characters and Strings
© The McGraw−Hill
Companies, 2005
Text
9.6
String word1, word2;
StringBuffer and StringBuilder
word1
515
word2
word1 = new String("Java");
word2 = new String("Java");
Whenever the new operator is used,
there will be a new object.
String word1, word2;
:String
:String
Java
Java
word1
word2
word1 = "Java";
word2 = "Java";
:String
Literal string constant such as “Java” will
always refer to the one object.
Java
Figure 9.3 Difference between using and not using the new operator for String.
are primitive data type. When we use the same literal String constants in a program,
there will be exactly one String object.
1. Show the state of memory after the following statements are executed.
String
str1 =
str2 =
str3 =
str2 =
str1, str2, str3;
"Jasmine";
"Oolong";
str2;
str1;
9.6 StringBuffer and StringBuilder
A String object is immutable, which means that once a String object is created, we
cannot change it. In other words, we can read individual characters in a string, but
we cannot add, delete, or modify characters of a String object. Remember that the
methods of the String class, such as replaceAll and substring, do not modify the
original string; they return a new string. Java adopts this immutability restriction
to implement an efficient memory allocation scheme for managing String objects.
The immutability is the reason why we can treat the string data much as a primitive data type.
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
516
Chapter 9
string
manipulation
9. Characters and Strings
Text
© The McGraw−Hill
Companies, 2005
Characters and Strings
Creating a new string from the old one will work for most cases, but sometimes manipulating the content of a string directly is more convenient. When we
need to compose a long string from a number of words, for example, being able to
manipulate the content of a string directly is much more convenient than creating a
new copy of a string. String manipulation here means operations such as replacing
a character, appending a string with another string, deleting a portion of a string, and
so forth. If we need to manipulate the content of a string directly, we must use either
the StringBuffer or the StringBuilder class. Here’s a simple example of modifying
the string Java to Diva using a StringBuffer object:
StringBuffer word = new StringBuffer( "Java" );
word.setCharAt(0, 'D');
word.setCharAt(1, 'i');
StringBuffer
Notice that no new string is created, the original string Java is modified. Also, we
must use the new method to create a StringBuffer object.
The StringBuffer and StringBuilder classes behave exactly the same (i.e., they
support the same set of public methods), but the StringBuilder class in general has a
better performance. The StringBuilder class is new to Java 2 SDK version 1.5, so it
cannot be used with the older versions of Java SDK. There are advanced cases
where you have to use the StringBuffer class, but for the sample string processing
programs in this book, we can use either one of them. Of course, to use the StringBuilder class, we must be using version 1.5 SDK. We can also continue to use the
StringBuffer class with version 1.5.
Because the StringBuffer class can be used with all versions of Java SDK, and
the string processing performance in not our major concern here, we will be using
the StringBuffer class exclusively in this book. If the string processing performance
is a concern, then all we have to do is to replace all occurrences of the word StringBuffer to StringBuilder in the program and run it with version 1.5 SDK.
Let’s look at some examples using StringBuffer objects. The first example
reads a sentence and replaces all vowels in the sentence with the character X.
/*
Chapter 9 Sample Program: Replace every vowel in a given sentence
with 'X' using StringBuffer.
File: Ch9ReplaceVowelsWithX.java
*/
import javax.swing.*;
class Ch9ReplaceVowelsWithX {
public static void main (String[] args) {
StringBuffer
String
tempStringBuffer;
inSentence;
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
9. Characters and Strings
© The McGraw−Hill
Companies, 2005
Text
9.6
StringBuffer and StringBuilder
517
numberOfCharacters;
letter;
int
char
inSentence
= JOptionPane.showInputDialog(null,
"Enter a sentence:");
tempStringBuffer
= new StringBuffer(inSentence);
numberOfCharacters = tempStringBuffer.length();
for (int index = 0; index < numberOfCharacters; index++) {
letter = tempStringBuffer.charAt(index);
if ( letter
letter
letter
letter
letter
==
==
==
==
==
'a'
'e'
'i'
'o'
'u'
||
||
||
||
||
letter
letter
letter
letter
letter
==
==
==
==
==
'A'
'E'
'I'
'O'
'U'
||
||
||
||
) {
tempStringBuffer.setCharAt(index,'X');
}
}
System.out.println( "Input: " + inSentence + "\n");
System.out.println( "Output: " + tempStringBuffer );
}
}
Notice how the input routine is done. We are reading in a String object and
converting it to a StringBuffer object, because we cannot simply assign a String
object to a StringBuffer variable. For example, the following code is invalid:
n
o
i
s
r
e
Bad V
StringBuffer strBuffer = JOptionPane.showInputDialog(null,
"Enter a sentence:");
We are required to create a StringBuffer object from a String object as in
String
str
= "Hello";
StringBuffer strBuf = new StringBuffer( str );
We cannot input StringBuffer objects.We have to input String objects and
convert them to StringBuffer objects.
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
518
Chapter 9
9. Characters and Strings
© The McGraw−Hill
Companies, 2005
Text
Characters and Strings
Our next example constructs a new sentence from input words that have an
even number of letters. The program stops when the word STOP is read. Let’s begin
with the pseudocode:
set tempStringBuffer to empty string;
repeat = true;
while ( repeat ) {
read in next word;
if (word is "STOP") {
repeat = false;
} else if (word has even number of letters) {
append word to tempStringBuffer;
}
}
And here’s the actual code:
/*
Chapter 9 Sample Program: Constructs a new sentence from
input words that have an even number of letters.
File: Ch9EvenLetterWords.java
*/
import javax.swing.*;
class Ch9EvenLetterWords {
public static void main (String[] args) {
boolean
repeat = true;
String
word;
Create StringBuffer object
with an empty string.
StringBuffer tempStringBuffer = new StringBuffer("");
while ( repeat ) {
word = JOptionPane.showInputDialog(null, "Next word:");
if ( word.equals("STOP") ) {
repeat = false;
} else if ( word.length() % 2 == 0 ) {
tempStringBuffer.append(word + " ");
}
}
Append word
and a space to
tempStringBuffer.
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
9. Characters and Strings
© The McGraw−Hill
Companies, 2005
Text
9.6
StringBuffer and StringBuilder
519
System.out.println( "Output: " + tempStringBuffer );
}
}
We use the append method to append a String or a StringBuffer object to the
end of a StringBuffer object. The method append also can take an argument of the
primitive data type. For example, all the following statements are valid:
int
float
char
i = 12;
x = 12.4f;
ch = 'W';
StringBuffer str = new StringBuffer("");
str.append(i);
str.append(x);
str.append(ch);
Any primitive data type argument is converted to a string before it is appended to a
StringBuffer object.
Notice that we can write the second example using only String objects. Here’s
how:
boolean repeat = true;
String word, newSentence;
newSentence = ""; //empty string
while ( repeat ) {
word = JOptionPane.showInputDialog(null, "Next word:");
if ( word.equals("STOP") )
repeat = false;
else if ( word.length() % 2 == 0 )
newSentence = newSentence + word;
//string concatenation
}
Although this code does not explicitly use any StringBuffer object, the Java
compiler may use StringBuffer when compiling the string concatenation operator.
For example, the expression
newSentence + word
can be compiled as if the expression were
new StringBuffer().append(word).toString()
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
520
Chapter 9
9. Characters and Strings
Text
© The McGraw−Hill
Companies, 2005
Characters and Strings
Using the append method of StringBuffer is preferable to using the string concatenation operator + because we can avoid creating temporary string objects by using
StringBuffer.
In addition to appending a string at the end of StringBuffer, we can insert a string
at a specified position by using the insert method. The syntax for this method is
<StringBuffer> . insert ( <insertIndex>, <value> ) ;
where <insertIndex> must be greater than or equal to 0 and less than or equal to the
length of <StringBuffer> and the <value> is an object or a value of the primitive data
type. For example, to change the string
Java is great
to
Java is really great
we can execute
StringBuffer str = new StringBuffer("Java is great");
str.insert(8, "really ");
1. Determine the value of str after the following statements are executed.
a. StringBuffer str
= new StringBuffer( "Caffeine" );
str.insert(0, "Dr. ");
b. String
str = "Caffeine";
StringBuffer str1 =
new StringBuffer( str.substring(1, 3) );
str1.append('e');
str = "De" + str1;
c. String
str = "Caffeine";
StringBuffer str =
new StringBuffer( str.substring(4, 8);
str1.insert (3,'f');
str = "De" + str1
2. Assume a String object str is assigned as a string value. Write a code segment
to replace all occurrences of lowercase vowels in a given string to the letter C
by using String and StringBuffer objects.
3. Find the errors in the following code.
String
str
= "Caffeine";
StringBuffer str1 = str.substring(1, 3);
str1.append('e');
System.out(str1);
str1 = str1 + str;
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
9. Characters and Strings
© The McGraw−Hill
Companies, 2005
Text
9.7
9.7
Sample Development
521
Sample Development
Sample Development
Building Word Concordance
word concordance
One technique to analyze a historical document or literature is to track word occurrences. A basic form of word concordance is a list of all words in a document and the
number of times each word appears in the document. Word concordance is useful in
revealing the writing style of an author. For example, given a word concordance of
a document, we can scan the list and count the numbers of nouns, verbs, prepositions,
and so forth. If the ratios of these grammatical elements differ significantly between the
two documents, there is a high probability that they are not written by the same person.
Another application of word concordance is seen in the indexing of a document, which,
for each word, lists the page numbers or line numbers where it appears in the document. In this sample development, we will build a word concordance of a given document, utilizing the string-processing technique we learned in this chapter.
One of the most popular search engine websites on the Internet today is
Google (www.google.com). At the core of their innovative technology is a
concordance of all Web pages on the Internet. Every month the company’s
Web crawler software visits 3 billion (and steadily growing) Web pages, and
from these visits, a concordance is built. When the user enters a query, the
Google servers search the concordance for a list of matching Web pages and
return the list in the order of relevance.
Problem Statement
Write an application that will build a word concordance of a document.The output from the application is an alphabetical list of all words in the given document
and the number of times they occur in the document. The documents are a text
file (contents of the file are ASCII characters), and the output of the program is
saved as an ASCII file also.
Overall Plan
As usual, let’s begin the program development by first identifying the major tasks of the
program. The first task is to get a text document from a designated file. We will use a
helper class called FileManager to do this task. File processing techniques to implement
the FileManager class will be presented in Chapter 12.The whole content of an ASCII file
is represented in the program as a single String object. Using a pattern-matching technique, we extract individual words from the document. For each distinct word in the document, we associate a counter and increment it every time the word is repeated. We will
use the second helper class called WordList for maintaining a word list.An entry in this list
has two components—a word and how many times this word occurs in the document. A
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
522
Chapter 9
9.7
9. Characters and Strings
© The McGraw−Hill
Companies, 2005
Text
Characters and Strings
Sample Development—continued
WordList object can handle an unbounded number of entries. Entries in the list are
arranged in alphabetical order. We will learn how to implement the WordList class in
Chapter 10.
We can express the program logic in pseudocode as
program
tasks
while ( the user wants to process another file ) {
Task 1: read the file;
Task 2: build the word list;
Task 3: save the word list to a file;
}
Let’s look at the three tasks and determine objects that will be responsible for handling the tasks. For the first task, we will use the helper class FileManager. For the second
task of building a word list, we will define the Ch9WordConcordance class, whose instance will use the Pattern and Matcher classes for word extraction, and another helper
class WordList for maintaining the word list. The last task of saving the result is done by
the FileManager class also.
Finally, we will define a top-level control object that manages all other objects. We
will call this class Ch9WordConcordanceMain. This will be our instantiable main class.
Here’s our working design document:
program
classes
Design Document: Ch9WordConcordanceMain
Class
Purpose
Ch9WordConcordanceMain
The instantiable main class of the program
that implements the top-level program
control.
The key class of the program. An instance of
this class manages other objects to build the
word list.
A helper class for opening a file and saving
the result to a file. Details of this class can be
found in Chapter 12.
Another helper class for maintaining a word
list. Details of this class can be found in
Chapter 10.
Classes for pattern-matching operations.
Ch9WordConcordance
FileManager
WordList
Pattern/Matcher
Figure 9.4 is the working program diagram.
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
9. Characters and Strings
© The McGraw−Hill
Companies, 2005
Text
9.7
Sample Development
WordList
FileManager
Ch9Word
Concordance
Pattern
523
Ch9Word
ConcordanceMain
Matcher
A helper class
provided to us
A class we
implement
System
classes
Figure 9.4 The program diagram for the Ch9WordConcordanceMain program. Base system classes
such as String and JOptionPane are not shown.
In lieu of the Pattern and Matcher classes, we could use the StringTokenizer class. This class is fairly straightforward to use if the white space
(tab, return, blank, etc.) is a word delimiter. However, using this class becomes
a little more complicated if we need to include punctuation marks and others as a word delimiter also. Overall, the Pattern and Matcher classes are
more powerful and useful in many types of applications than the StringTokenizer class.
We will implement this program in four major steps:
development steps
1. Start with a program skeleton. Define the main class with data members.To test
the main class, we will also define a skeleton Ch9WordConcordance class with
just a default constructor.
2. Add code to open a file and save the result. Extend the step 1 classes as necessary.
3. Complete the implementation of the Ch9WordConcordance class.
4. Finalize the code by removing temporary statements and tying up loose ends.
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
524
Chapter 9
9.7
9. Characters and Strings
© The McGraw−Hill
Companies, 2005
Text
Characters and Strings
Sample Development—continued
Step 1 Development: Skeleton
The design of Ch9WordConcordanceMain is straightforward, as its structure is very similar to that of other main classes. We will make this an instantiable main class and define
the start method that implements the top-level control logic. We will define a default
constructor to create instances of other classes. A skeleton Ch9WordConcordance class
is also defined in this step so we can compile and run the main class. The skeleton
Ch9WordConcordance class only has an empty default constructor. The working design
document for the Ch9WordConcordanceMain class is as follows:
step 1
design
Design Document: The Ch9WordConcordanceMain Class
step 1 code
Method
Visibility
Purpose
<constructor>
public
start
private
Creates the instances of other classes in the
program.
Implements the top-level control logic of the
program.
For the skeleton, the start method loops (doing nothing inside the loop in this
step) until the user selects No on the confirmation dialog. Here’s the skeleton:
/*
Chapter 9 Sample Development: Word Concordance
File: Step1/Ch9WordConcordanceMain.java
*/
import javax.swing.*;
class Ch9WordConcordanceMain
{
private FileManager fileManager;
private Ch9WordConcordance builder;
//------------------------------//
Main method
//------------------------------public static void main(String[] args) {
Ch9WordConcordanceMain main = new Ch9WordConcordanceMain();
main.start();
}
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
9. Characters and Strings
© The McGraw−Hill
Companies, 2005
Text
9.7
Sample Development
525
public Ch9WordConcordanceMain() {
fileManager = new FileManager( );
builder
= new Ch9WordConcordance( );
}
private void start( ) {
int reply;
while (true) {
reply = JOptionPane.showConfirmDialog(null,
"Run the program?",
"Word List Builder",
JOptionPane.YES_NO_OPTION);
if (reply == JOptionPane.NO_OPTION) {
break;
}
}
JOptionPane.showMessageDialog(null,
"Thank you for using the program\n"
+ "Good-Bye");
}
}
The skeleton Ch9WordConcordance class has only an empty default constructor.
Here’s the skeleton class:
class Ch9WordConcordance
{
public Ch9WordConcordance() {
}
}
step 1 test
We run the program and verify that the constructor is executed correctly, and the
repetition control in the start method works as expected.
Step 2 Development: Open and Save Files
step 2
design
In the second development step, we add routines to handle input and output. The tasks
of opening and saving a file are delegated to the service class FileManager. We will
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
526
9.7
Chapter 9
9. Characters and Strings
Text
© The McGraw−Hill
Companies, 2005
Characters and Strings
Sample Development—continued
learn the implementation details of the FileManager class in Chapter 12. Our responsibility right now is to use the class correctly. The class provides two key methods: one to
open a file and another to save a file. So that we can create and view the content easily,
the FileManager class deals only with text files. To open a text file, we call its openFile
method. There are two versions. With the first version, we pass the filename. For example,
the code
FileManager fm = new FileManager();
String
doc = ...; //assign string data
fm.saveFile("output1.txt", doc);
will save the string data doc to a file named output1.txt. With the second version, we will
let the end user select a file, using the standard file dialog. A sample file dialog is shown in
Figure 9.5. With the second version, we pass only the string data to be saved as
fm.saveFile(doc);
When there’s an error in saving a file, an IOException is thrown.
To open a text file, we use one of the two versions of the openFile method. The
distinction is identical to the one for the saveFile methods. The first version requires the
filename to open. The second version allows the end user to select a file to save the data,
so we pass no parameter. The openFile method will throw a FileNotFoundException
when the designated file cannot be found and an IOException when the designated file
cannot be opened correctly.
Figure 9.5 A sample file dialog for opening a file.
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
9. Characters and Strings
© The McGraw−Hill
Companies, 2005
Text
9.7
Sample Development
527
Here’s the summary of the FileManager class:
Public Methods of FileManager
public String openFile(String filename)
throws FileNotFoundException, IOException
Opens the text file filename and returns the content as a String.
public String openFile( )
throws FileNotFoundException, IOException
Opens the text file selected by the end user, using the standard file open dialog,
and returns the content as a String.
public String saveFile(String filename, String data)
throws IOException
Save the string data to filename.
public String saveFile(String data) throws IOException
Saves the string data to a file selected by the end user, using the standard file
save dialog.
We modify the start method to open a file, create a word concordance, and
then save the generated word concordance to a file. The method is defined as follows:
private void start( ) {
int
reply;
String document, wordList;
while (true) {
reply = ...; //confirmation dialog reply
if (reply == JOptionPane.NO_OPTION) {
break;
}
document = inputFile(); //open file
wordList = build(document); //build concordance
Added portion
saveFile(wordList); //save the generated concordance
}
... //'Good-bye' message dialog
}
The inputFile method is defined as follows:
private String inputFile( ) {
String doc = "";
try {
doc = fileManager.openFile( );
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
528
9.7
Chapter 9
9. Characters and Strings
Text
© The McGraw−Hill
Companies, 2005
Characters and Strings
Sample Development—continued
} catch (FileNotFoundException e) {
System.out.println("File not found.");
} catch (IOException e) {
System.out.println("Error in opening file: "
+ e.getMessage());
}
System.out.println("Input Document:\n" + doc); //TEMP
return doc;
}
with a temporary output to verify the input routine. Because the openFile method of
FileManager throws exceptions, we handle them here with the try-catch block.
The saveFile method is defined as follows:
private void saveFile(String list) {
try {
fileManager.saveFile(list);
} catch (IOException e) {
System.out.println("Error in saving file: "
+ e.getMessage());
}
}
The method is very simple as the hard work of actually saving the text data is done by our
FileManager helper object.
Finally, the build method is defined as
private String build(String document) {
String concordance;
concordance = builder.build(document);
return concordance;
}
The Ch9WordConcordanceMain class is now complete. To run and test this class,
we will define a stub build method for the Ch9WordConcordance class. The method is
temporarily defined as
public String build(String document) {
//TEMP
String list
= "one 14\ntwo 3\nthree 3\nfour 5\nfive 92\n";
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
9. Characters and Strings
© The McGraw−Hill
Companies, 2005
Text
9.7
Sample Development
529
return list;
//TEMP
}
step 2 code
We will implement the method fully in the next step.
Here’s the final Ch9WordConcordanceMain class:
/*
Chapter 9 Sample Development: Word Concordance
File: Step2/Ch9WordConcordanceMain.java
*/
import java.io.*;
import javax.swing.*;
class Ch9WordConcordanceMain
{
...
private String build(String document) {
build
String concordance;
concordance = builder.build(document);
return concordance;
}
private String inputFile( ) {
String doc = "";
inputFile
try {
doc = fileManager.openFile( );
} catch (FileNotFoundException e) {
System.out.println("File not found.");
} catch (IOException e) {
System.out.println("Error in opening file: " + e.getMessage());
}
System.out.println("Input Document:\n" + doc); //TEMP
return doc;
}
private void saveFile(String list) {
try {
fileManager.saveFile(list);
saveFile
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
530
Chapter 9
9.7
9. Characters and Strings
Text
© The McGraw−Hill
Companies, 2005
Characters and Strings
Sample Development—continued
} catch (IOException e) {
System.out.println("Error in saving file: " + e.getMessage());
}
}
private void start( ) {
while (true) {
...
document = inputFile();
start
wordList = build(document);
saveFile(wordList);
}
...
}
}
The temporary Ch9WordConcordance class now has the stub build method:
class Ch9WordConcordance {
...
public String build(String document) {
//TEMP
String list = "one 14\ntwo 3\nthree 3\nfour 5\nfive 92\n";
return list;
//TEMP
}
}
step 2 test
We are ready to run the program. The step 2 directory contains several sample
input files. We will open them and verify the file contents are read correctly by checking the temporary echo print output to System.out. To verify the output routine, we
save to the output (the temporary output created by the build method of Ch9WordConcordance) and verify its content. Since the output is a text file, we can use any
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
9. Characters and Strings
© The McGraw−Hill
Companies, 2005
Text
9.7
Sample Development
531
word processor or text editor to view its contents. (Note: If we use NotePad on the Windows platform to view the file, it may not appear correctly. See the box below on how
to avoid this problem.)
The control characters used for a line separator are not the same for each platform (Windows, Mac, Unix, etc.) . One platform may use \n for a line separator
while another platform may use \r\n for a line separator. Even on the same
platform, different software may not interpret the control characters in the
same way. To make our Java code work correctly across all platforms, we do,
for example,
String newline
= System.getProperties().getProperty("line.separator");
String output = "line 1" + newline + "line 2" + newline;
instead of
String output = "line 1\nline 2\n";
Step 3 Development: Generate Word Concordance
step 3
design
In the third development step, we finish the program by implementing the Ch9WordConcordance class, specifically, its build method. Since we are using another helper class
in this step, first we must find out how to use this helper class. The WordList class supports the maintenance of a word list. Every time we extract a new word from the document, we enter this word into a word list. If the word is already in the list, its count is incremented by 1. If the word occurs for the first time in the document, then the word is
added to the list with its count initialized to 1. When we are done processing the document, we can get the word concordance from a WordList by calling its getConcordance
method. The method returns the list as a single String with each line containing a word
and its count in the following format:
2
1
1
2
1
1
1
7
1
2
1
Chapter
Early
However
In
already
also
an
and
are
as
because
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
532
9.7
Chapter 9
9. Characters and Strings
Text
© The McGraw−Hill
Companies, 2005
Characters and Strings
Sample Development—continued
Because a single WordList object handles multiple documents, there’s a method
called reset to clear the word list before processing the next document. Here’s the
method summary:
Public Methods of WordList
public void add(String word)
Increments the count for the given word. If the word is already in the list, its
count is incremented by 1. If the word does not exist in the list, then it is added
to the list with its count set to 1.
public String getConcordance( )
Returns the word concordance in alphabetical order of words as a single string.
Each line consists of a word and its count.
public void reset( )
Clears the internal data structure so a new word list can be constructed.This
method must be called every time before a new document is processed.
The general idea behind the build method of the Ch9WordConcordance class is
straightforward. We need to keep extracting a word from the document, and for every
word found, we add it to the word list. Expressed in pseudocode, we have
while (document has more words) {
word = next word in the document;
wordList.add(word);
}
String concordance = wordList.getConcordance();
The most difficult part here is how to extract words from a document. We can write
our own homemade routine to extract words, based on the technique presented in
Section 9.2. However, this is too much work to get the task done. Writing a code that
detects various kinds of word terminators (in addition to space, punctuation mark, control characters such as tab, new line, etc., all satisfy as the word terminator) is not that
easy. Conceptually, it is not that hard, but it can be quite tedious to iron out all the details. Instead, we can use the pattern-matching technique provided by the Pattern and
Matcher classes for a reliable and efficient solution.
The pattern for finding a word can be stated in a regular expression as
\b\w+\b
Putting it in a string format results in
"\\b\\w+\\b"
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
9. Characters and Strings
© The McGraw−Hill
Companies, 2005
Text
9.7
Sample Development
533
The Pattern and Matcher objects are thus created as
Pattern pattern = Pattern.compile("\\b\\w+\\b");
Matcher matcher = pattern.matcher(document);
and the control loop to find and extract words is
wordList.reset();
while (matcher.find( )) {
wordList.add(document.substring(matcher.start(),
matcher.end()));
}
step 3 code
Here’s the final Ch9WordConcordance class:
/*
Chapter 9 Sample Development: Word Concordance
File: Step3/Ch9WordConcordance.java
*/
import java.util.regex.*;
class Ch9WordConcordance {
private static final String WORD = "\\b\\w+\\b";
private WordList wordList;
private Pattern pattern;
public Ch9WordConcordance() {
wordList = new WordList();
pattern = Pattern.compile(WORD); //pattern is compiled only once
}
public String build(String document) {
build
Matcher matcher = pattern.matcher(document);
wordList.reset();
while (matcher.find()) {
wordList.add(document.substring(matcher.start(),
matcher.end()));
}
return wordList.getConcordance();
}
}
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
534
Chapter 9
9.7
9. Characters and Strings
Text
© The McGraw−Hill
Companies, 2005
Characters and Strings
Sample Development—continued
step 3 test
Notice how short the class is, thanks to the power of pattern matching and the
helper WordList class.
We run the program against varying types of input text files. We can use a long
document such as the term paper for the last term’s economy class (don’t forget to save
it as a text file before testing). We should also use some specially created files for testing
purposes. One file may contain only one word repeated 7 times, for example. Another
file may contain no words at all. We verify that the program works correctly for all types
of input files.
Step 4 Development: Finalize
program
review
As always, we finalize the program in the last step.We perform a critical review to find any
inconsistency or error in the methods, any incomplete methods, places to add more comments, and so forth.
In addition, we may consider possible extensions. One is an integrated user interface where the end user can view both the input document files and the output word list
files. Another is the generation of different types of list. In the sample development, we
count the number of occurrences of each word. Instead, we can generate a list of positions where each word appears in the document. The WordList class itself needs to be
modified for such extension.
S u m m a r y
•
•
•
•
•
•
•
•
•
The char data type represents a single character.
The char constant is denoted by a single quotation mark, for example, ‘a’.
The character coding scheme used widely today is ASCII (American
Standard Code for Information Exchange).
Java uses Unicode, which is capable of representing characters of diverse
languages. ASCII is compatible with Unicode.
A string is a sequence of characters, and in Java, strings are represented by
String objects.
The Pattern and Matcher classes are introduced in Java 2 SDK 1.4. They
provide support for pattern-matching applications.
Regular expression is used to represent a pattern to match (search) in a given
text.
The String objects are immutable. Once they are created, they cannot be
changed.
To manipulate mutable strings, use StringBuffer.
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
9. Characters and Strings
© The McGraw−Hill
Companies, 2005
Text
Exercises
•
535
Strings are objects in Java, and the rules for comparing objects apply when
comparing strings.
Only one String object is created for the same literal String constants.
The standard classes described or used in this chapter are
•
•
String
Pattern
StringBuffer
Matcher
StringBuilder
K e y
C o n c e p t s
characters
strings
string processing
regular expression
pattern matching
character encoding
String comparison
E x e r c i s e s
1. What is the difference between ‘a’ and “a”?
2. Discuss the difference between
str = str + word; //string concatenation
and
tempStringBuffer.append(word)
where str is a String object and tempStringBuffer is a StringBuffer object.
3. Show that if x and y are String objects and x == y is true, then x.equals(y) is
also true, but the reverse is not necessarily true.
4. What will be the output from the following code?
StringBuffer word1, word2;
word1 = new StringBuffer("Lisa");
word2 = word1;
word2.insert(0, "Mona ");
System.out.println(word1);
5. Show the state of memory after the execution of each statement in the
following code.
String word1, word2;
word1 = "Hello";
word2 = word1;
word1 = "Java";
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
536
Chapter 9
9. Characters and Strings
© The McGraw−Hill
Companies, 2005
Text
Characters and Strings
6. Using a state-of-memory diagram, illustrate the difference between a null
string and an empty string—a string that has no characters in it. Show the
state-of-memory diagram for the following code. Variable word1 is a null
string, while word2 is an empty string.
String word1, word2;
word1 = null;
word2 = "";
7. Draw a state-of-memory diagram for each of the following groups of
statements.
String word1, word2;
String word1, word2;
word1 = "French Roast";
word2 = word1;
word1 = "French Roast";
word2 = "French Roast";
8. Write a GUI application that reads in a character and displays the character’s
ASCII. The getText method of the JTextField class returns a String object, so
you need to extract a char value, as in
String inputString = inputField.getText();
char character = inputString.charAt(0);
Display an error message if more than one character is entered.
9. Write a method that returns the number of uppercase letters in a String object
passed to the method as an argument. Use the class method isUpperCase of
the Character class, which returns true if the passed parameter of type char
is an uppercase letter. You need to explore the Character class from the
java.lang package on your own.
10. Redo Exercise 9 without using the Character class. Hint: The ASCII of any
uppercase letter will fall between 65 (code for ‘A’) and 90 (code for ‘Z’).
11. Write a program that reads a sentence and prints out the sentence with all
uppercase letters changed to lowercase and all lowercase letters changed to
uppercase.
12. Write a program that reads a sentence and prints out the sentence in reverse
order. For example, the method will display
?uoy era woH
for the input
How are you?
13. Write a method that transposes words in a given sentence. For example,
given an input sentence
The gate to Java nirvana is near
the method outputs
ehT etag ot avaJ anavrin si raen
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
9. Characters and Strings
© The McGraw−Hill
Companies, 2005
Text
Exercises
537
To simplify the problem, you may assume the input sentence contains no
punctuation marks. You may also assume that the input sentence starts with a
nonblank character and that there is exactly one blank space between the
words.
14. Improve the method in Exercise 13 by removing the assumptions. For
example, an input sentence could be
Hello, how are you? I use JDK 1.2.2.
Bye-bye.
An input sentence may contain punctuation marks and more than one blank
space between two words. Transposing the above will result in
olleH, woh era uoy? I esu KDJ 1.2.2. eyB-eyb.
Notice the position of punctuation marks does not change and only one
blank space is inserted between the transposed words.
15. The Ch9CountWords program that counts the number of words in a given
sentence has a bug. If the input sentence has one or more blank spaces at the
end, the value for wordCount will be 1 more than the actual number of
words in the sentence. Correct this bug in two ways: one with the trim
method of the String class and another without using this method.
16. The Ch9ExtractWords program for extracting words in a given sentence
includes the test
if (beginIdx != endIdx) ...
Describe the type of input sentences that will result in the variables beginIdx
and endIdx becoming equal.
17. Write an application that reads in a sentence and displays the count of
individual vowels in the sentence. Use any output routine of your
choice to display the result in this format. Count only the lowercase
vowels.
Vowel counts for the sentence
Mary had a little lamb.
#
#
#
#
#
of
of
of
of
of
'a'
'e'
'i'
'o'
'u'
:
:
:
:
:
4
1
1
0
0
18. Write an application that determines if an input word is a palindrome. A
palindrome is a string that reads the same forward and backward, for
example, noon and madam. Ignore the case of the letter. So, for example,
maDaM, MadAm, and mAdaM are all palindromes.
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
538
Chapter 9
9. Characters and Strings
Text
© The McGraw−Hill
Companies, 2005
Characters and Strings
19. Write an application that determines if an input sentence is a palindrome, for
example, A man, a plan, a canal, Panama! You ignore the punctuation
marks, blanks, and case of the letters.
Development Exercises
For the following exercises, use the incremental development methodology to
implement the program. For each exercise, identify the program tasks, create a
design document with class descriptions, and draw the program diagram. Map out
the development steps at the start. Present any design alternatives and justify your
selection. Be sure to perform adequate testing at the end of each development step.
20. Write an Eggy-Peggy program. Given a string, convert it to a new string by
placing egg in front of every vowel. For example, the string
I Love Java
becomes
eggI Leegoveege Jeegaveega
21. Write a variation of the Eggy-Peggy program. Implement the following four
variations:
• Sha
• Na
• Sha Na Na
• Ava
Add sha to the beginning of every word.
Add na to the end of every word.
Add sha to the beginning and na na to the end of every
word.
Move the first letter to the end of the word and add ava
to it.
Allow the user to select one of four possible variations. Use JOptionPane for
input.
22. Write a word guessing game. The game is played by two players, each
taking a turn in guessing the secret word entered by the other player. Ask the
first player to enter a secret word. After a secret word is entered, display a
hint that consists of a row of dashes, one for each letter in the secret word.
Then ask the second player to guess a letter in the secret word. If the letter is
in the secret word, replace the dashes in the hint with the letter at all
positions where this letter occurs in the word. If the letter does not appear in
the word, the number of incorrect guesses is incremented by 1. The second
player keeps guessing letters until either
• The player guesses all the letters in the word.
or
• The player makes 10 incorrect guesses.
Here’s a sample interaction with blue indicating the letter entered by the player:
- - - S
- - - -
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
9. Characters and Strings
Text
© The McGraw−Hill
Companies, 2005
Exercises
539
A
- A - A
V
- A V A
D
- A V A
J
J A V A
Bingo! You won.
Support the following features:
• Accept an input in either lowercase or uppercase.
• If the player enters something other than a single letter (a digit, special
character, multiple letters, etc.), display an error message. The number
of incorrect guesses is not incremented.
• If the player enters the same correct letter more than once, reply with
the previous hint.
• Entering an incorrect letter the second time is counted as another
wrong guess. For example, suppose the letter W is not in the secret
word. Every time the player enters W as a guess, the number of
incorrect guesses is incremented by 1.
After a game is over, switch the role of players and continue with another
game. When it is the first player’s turn to enter a secret word, give an option
to the players to stop playing. Keep the tally and announce the winner at the
end of the program. The tally will include for each player the number of
wins and the total number of incorrect guesses made for all games. The
player with more wins is the winner. In the case where both players have the
same number of wins, the one with the lower number of total incorrect
guesses is the winner. If the total numbers of incorrect guesses for both
players are the same also, then it is a draw.
23. Write another word guessing game similar to the one described in
Exercise 22. For this word game, instead of using a row of dashes for a
secret word, a hint is provided by displaying the letters in the secret word in
random order. For example, if the secret word is COMPUTER, then a
possible hint is MPTUREOC. The player has only one chance to enter a
guess. The player wins if he guessed the word correctly. Time how long the
player took to guess the secret word. After a guess is entered, display
whether the guess is correct or not. If correct, display the amount of time in
minutes and seconds used by the player.
The tally will include for each player the number of wins and the total
amount of time taken for guessing the secret words correctly (amount of
time used for incorrect guesses is not tallied). The player with more wins is
the winner. In the case where both players have the same number of wins,
the one who used the lesser amount of time for correct guesses is the winner.
If the total time used by both players is the same also, then it is a draw.
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
540
Chapter 9
9. Characters and Strings
© The McGraw−Hill
Companies, 2005
Text
Characters and Strings
24. The word game Eggy-Peggy is an example of encryption. Encryption has
been used since ancient times to communicate messages secretly. One of the
many techniques used for encryption is called a Caesar cipher. With this
technique, each character in the original message is shifted N positions. For
example, if N ⫽ 1, then the message
I
d r i n k
o n l y
d e c a f
becomes
J ! e s j o l ! p o m z ! e f d b g
The encrypted message is decrypted to the original message by shifting
back every character N positions. Shifting N positions forward and backward
is achieved by converting the character to ASCII and adding or subtracting N.
Write an application that reads in the original text and the value for N and
displays the encrypted text. Make sure the ASCII value resulting from
encryption falls between 32 and 126. For example, if you add 8 (value of N )
to 122 (ASCII code for ‘z’ ), you should “wrap around” and get 35.
Write another application that reads the encrypted text and the value
for N and displays the original text by using the Caesar cipher technique.
Design a suitable user interface.
25. Another encryption technique is called a Vignere cipher. This technique is
similar to a Caesar cipher in that a key is applied cyclically to the original
message. For this exercise a key is composed of uppercase letters only.
Encryption is done by adding the code values of the key’s characters to the
code values of the characters in the original message. Code values for the
key characters are assigned as follows: 0 for A, 1 for B, 2 for C, . . . , and 25
for Z. Let’s say the key is COFFEE and the original message is I drink only
decaf. Encryption works as follows:
I
|
+
|
C
|
+
|
O
d
|
+
|
F
r i n k
o n l y
d e c a
|
+
. . .
|
F E E C O F F E E C O F F E
K – i W
. . .
f
|
+
|
E
j
Decryption reverses the process to generate the original message. Write
an application that reads in a text and displays the encrypted text. Make
sure the ASCII value resulting from encryption or decryption falls
between 32 and 126. You can get the code for key characters by (int)
keyChar - 65.
Wu (Otani): Introduction to
Object−Oriented
Programming with Java,
4th Edition
9. Characters and Strings
© The McGraw−Hill
Companies, 2005
Text
Exercises
541
Write another application that reads the encrypted text and displays the
original text, using the Vignere cipher technique.
26. A public-key cryptography allows anyone to encode messages while only
people with a secret key can decipher them. In 1977, Ronald Rivest, Adi
Shamir, and Leonard Adleman developed a form of public-key cryptography
called the RSA system.
To encode a message using the RSA system, one needs n and e. The
value n is a product of any two prime numbers p and q. The value e is any
number less than n that cannot be evenly divided into y (that is, y ⫼ e would
have a remainder), where y ⫽ (p ⫺ 1) ⫻ (q ⫺ 1). The values n and e can be
published in a newspaper or posted on the Internet, so anybody can encrypt
messages. The original character is encoded to a numerical value c by using
the formula
c ⫽ me mod n
where m is a numerical representation of the original character (for example,
1 for A, 2 for B, and so forth).
Now, to decode a message, one needs d. The value d is a number that
satisfies the formula
e ⭈ d mod y ⫽ 1
where e and y are the values defined in the encoding step. The original
character m can be derived from the encrypted character c by using the
formula
m ⫽ cd mod n
Write a program that encodes and decodes messages using the RSA system.
Use large prime numbers for p and q in computing the value for n, because
when p and q are small, it is not that difficult to find the value of d. When p
and q are very large, however, it becomes practically impossible to
determine the value of d. Use the ASCII values as appropriate for the
numerical representation of characters. Visit http://www.rsasecurity.com for
more information on how the RSA system is applied in the real world.