Download The Unicode Worldwide Character Standard is a character

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
BIT 2206/CSC 1209
INTRODUCTION TO
PROGRAMMING IN JAVA: THE
CHARACTER TYPE
LESSON 1
CONTENTS
1. Introduction
2. ASCII Code
3. Alternative codes
4. The Character class
5. Character input
6. Example problem - lower to upper case
conversion
6.1. Requirements
6.2. Analysis
6.3. Design
6.4. Implementation
6.5. Testing
7. The System.out.flush method
Example combines the two statement input data declaration used up until now, into a single statement
declaration. Example also introduces the concepts of Boundary Value Analysis (BVA) and limit testing.
1. INTRODUCTION
The Java type character is used for handling single characters such as letters, digits and
special symbols (e.g. question mark, full stop, colon etc.), or non-printable control character
(e.g. tab, newline etc.). In Java (like many other programming languages) characters are
written by enclosing them in single quotes. Examples:
'a'
'A'
'2'
'+'
'''
2. ASCII CODE
In the early days of computing characters were usually stored, in a computer, using a group of
8 bits, i.e. a byte. Originally, only seven of these bits were used. The eighth most significant
bit, referred to as the parity bit, was used for error checking. Using only seven bits there are
128 different character codes available (2^7). There is a generally accepted standard, called
the ASCII standard, which determines which characters can be encoded using the seven
available bits, and which character code represents which character. ASCII (pronounced
"ass-key") is an acronym for American Standard Code for Information Interchange.
3. ALTERNATIVE CODES
The ASCII standard was developed on the assumption that all computer usage would be in
English. The English alphabet has 26 letters derived from the Latin alphabet. This set of
letters is sufficient for only a small group of languages, e.g. English, Swahili and Hawaiian!
All other living languages use either the Latin alphabet plus other characters, or other nonLatin alphabets, or syllabaries. Use of the ASCII standard therefore presents a problem in
many countries.
3.1 LATIN-1 CODE
The obvious solution to addressing the above problem is to drop the use of the parity bit so
that 256 character codes are available. There are a number of "8 bit" character standards
available. Some languages (for example Ada) use what is commonly referred to as the
LATIN-1 standard (ISO-8859). In this standard the first 128 codes (0 to 127) adhere to the
ASCII standard, while the remaining codes provide for additional characters.
3.2 Unicode Worldwide Character Standard
The Unicode Worldwide Character Standard is a character coding system whereby
characters are stored in two bytes of memory (i.e. 16 bits as opposed to 8 bits). "At time of
writing" the Unicode standard contained 34,168 distinct coded characters. Java use the
Unicode Standard.
Provided that we have an editor that supports the Unicode character set we can include any of
the Unicode characters in our Java programs.
4. THE Character CLASS
The character class contains many useful
methods for manipulating and testing
characters. A Fragment of this class is
presented in Figure 1. This fragment
includes the following:







Character
Constructor to create an
instance of the class Character so
that it represents the primitive value
given as its argument.
charValue Returns the value of an
instance of the class Character.
getNumericValue returns the
Unicode numeric value of the
character as a non-negative integer.
isDigit determines if the specified
character is a digit (a number).
isLetter determines if the specified
character is a letter.
toLowerCase maps the given
character to its lowercase equivalent;
if the character has no lowercase
equivalent, the character itself is
returned.
toUpperCase converts the character
argument to uppercase.
Figure 1: Class diagram for Character class
Note: the above five functions are all class
methods so are invoked by linking the desired
method to the class name Character, e.g.:
Character.isLetter(n);
where n is a data item of type char. Note also
that the Character class contains many
methods of the form is... for carrying out
various test on instances of the type Character.
5. CHARACTER INPUT
Input, using the next method in the
Scanner class is always in the form of a
string. If, for example, we want integers or
doubles we use the nextInt or nextDouble
methods respectively. However there is no
"nextChar" method. There are mechanisms
for getting a single "char" from the input
stream but at present we do not have
sufficient knowledge to do this. However,
what we can do is input a charcter as an
ASCII integer and convert it to a "char"
using a cast. Thus:
char inputInt = input.next();
char inputChar = (char) inputInt;
where input (in input.next()) is an instance
of the Scanner class. Of course we can run the
two statments together as follows:
char inputChar = (char) input.next();
The code example presented in Table 1
indicates how two characters may be input.
// CHARACTER INPUT APPLICATION
// Pius Nyaanga
// Thursday 3 July 2013
// St. Paul’s University
import java.util.*;
class CharacterInputApp {
// ------------------- FIELDS -----------------------// Create Scanner class instance
private static Scanner input = new Scanner(System.in);
// ------------------ METHODS -----------------------public static void main(String[] args)
{
// Invite input
System.out.println("Input two characters seperated by a " +
"carriage return:");
// Read in input as a string.
char inputChar1 = (char) input.nextInt();
char inputChar2 = (char) input.nextInt();
// Output the result
System.out.println("input 1 = " + inputChar1 + " input 2 = " +
inputChar2);
}
}
Table 1: Character input code example
6. EXAMPLE PROBLEM LOWER TO UPPER CASE
CONVERSION
6.1 Requirements
To produce a program that converts lower case
alphabetic characters to upper case alphabetic characters
(Figure 2). Note that lower case letters a..z have
Unicodes 97..122, and upper case letters A..Z have
Unicodes 65..90. Therefore to convert from lower case
to upper case we must subtract -32 from the Unicode of
Figure 2: Lower to uppercase
character conversion
the input character.
6.2 Analysis
Using "noun extraction" the class diagram presented
in Figure 2 is proposed.
6.3 Design
From Figure 3 the analysis indicates that we need to
design a single class, Lower2UpperApp; all other
methods used are contained in existing classes that
come with the Java API.
Figure 3: Lower to Upper case class
diagram
6.3.1 Lower2UpperApp Class
Field Summary
private static Scanner input
A class instance field to facilitate input from the input
stream.
Method Summary
public static void main(String[] args)
Main method to read in a character from the keyboard as a
Unicode value, output this value (i.e. "echo" to the screen), and then
convert to upper case equivalent by subtracting 32. Output this new
Unicode value and the associated character.
A Nassi-Shneiderman in
Figure 4.
Figure 4: Nassi-Shneiderman charts for Lower2UpperApp
class method
6.4. Implementation
6.4.1 Lower2UpperApp Class
The implementation for the Lower2UpperApp Class is given in Table 2. Points to note:
1. We use the nextInt method contained in the Scanner class to input an Unicode
integer.
2. To covert a Unicode value into its character we use a cast:
3. character = (char) unicodeValue;
// LOWER 2 UPPER APPLICATION
// Pius Nyaanga
// Tuesday 2 March 2013
// Wednesday 30 June 2013
//St. Paul’s University
import java.util.*;
class Lower2UpperApp {
// ------------------- FIELDS -----------------------// Create Scanner class instance
private static Scanner input = new Scanner(System.in);
// ------------------ METHODS -----------------------public static void main(String[] args) {
char upperCaseChar;
int uniCodeValue;
// Input a unicode value and output associated charcater
System.out.print("Input a Unicode value:
");
uniCodeValue = input.nextInt();
System.out.println("Character equivalent is :
(char) uniCodeValue);
" +
// Subtract 32 to find uppercase equivalent and output.
uniCodeValue = uniCodeValue-32;
System.out.println("Unicode upper case equivalent is: " +
uniCodeValue);
upperCaseChar = (char) uniCodeValue;
System.out.println("Upper case charactere is:
" +
upperCaseChar);
}
}
Table 2: Lower to upper case conversion application (Version 1)
Of course to be in tune with the spirit of OOP we should not write code where appropriate
alternative pre-defined methods already exist (code reuse). Inspection of the character class
indicates that there is a method toUpperCase already available. Thus an alternative encoding
for the above might be as follows:
//
//
//
//
LOWER 2 UPPER APPLICATION VERSION 2
St. Paul’s University
Tuesday 2 March 2013
St. Paul’s University
import java.util.*;
class Lower2UpperApp2 {
// ------------------- FIELDS -----------------------// Create Scanner class instance
private static Scanner input = new Scanner(System.in);
// ------------------ METHODS -----------------------public static void main(String[] args) {
char lowerCaseChar, upperCaseChar;
// Input a character and output associated unicode
System.out.print("Input a Unicode value:
lowerCaseChar = (char) input.nextInt();
");
// Convert to uppercase equivalent and output.
upperCaseChar = Character.toUpperCase(lowerCaseChar);
System.out.println("Upper case charactere is:
" +
upperCaseChar);
}
}
Table 3: Lower to upper case conversion application (Version 2)
6.5 Testing
Boundary Value Analysis (BVA) Testing: When using input variables that can only take a
particular "range" of values it has been demonstrated that errors often occur at the boundaries
of the input domain. It is for this reason that Boundary Value Analysis (BVA) has been
developed as a testing technique. Boundary value analyses leads to a selection of test cases
that exercise bounding values for data items. At its simplest this involves the derivation of
test cases with values just above and just below the bounding values. Thus suitable boundary
values for the above application will be '`', 'b', 'y' and '{' (the Unicode character code for the
symbol ``' is 96, and that for the symbol `{' is 123).
Limit testing is related to BVA testing, and is concerned
EXPECTED
TEST CASE
with the generation of test cases to exercise the program
RESULT
when maximum and minimum input values are supplied. In
Unicode
Output
the some cases this may be the maxima/minima for the
type, in others this may be the limits of a particular range
that we are interested in ('a' to 'z' in the above case).
An appropriate set of BVA and limit test cases is given in
the table below. These test cases will also serve to test the
arithmetic operation of the code with the inclusion of a
sample input value near the middle of the prescribed range
(e.g. 'm'). We should also carry out some random data
validation testing.
number
("char"
equivalent
96 (')
'@'
97 (a)
'A'
98 (b)
'B'
77 ('m')
'M'
121 ('y')
'Y'
122 ('z')
'Z'
123 ('{')
[
Some sample output using the above test cases is given in Table 4.
$ $java Lower2UpperApp
Input a Unicode value:
96
Character equivalent is :
`
Unicode upper case equivalent is: 64
Upper case charactere is:
@
$java Lower2UpperApp
Input a Unicode value:
97
Character equivalent is :
a
Unicode upper case equivalent is: 65
Upper case charactere is:
A
$java Lower2UpperApp
Input a Unicode value:
98
Character equivalent is :
b
Unicode upper case equivalent is: 66
Upper case charactere is:
B
$java Lower2UpperApp
Input a Unicode value:
109
Character equivalent is :
m
Unicode upper case equivalent is: 77
Upper case charactere is:
M
$java Lower2UpperApp
Input a Unicode value:
121
Character equivalent is :
y
Unicode upper case equivalent is: 89
Upper case charactere is:
Y
$java Lower2UpperApp
Input a Unicode value:
122
Character equivalent is :
z
Unicode upper case equivalent is: 90
Upper case charactere is:
Z
$java Lower2UpperApp
Input a Unicode value:
123
Character equivalent is :
{
Unicode upper case equivalent is: 91
Upper case charactere is:
[
Table 4: Sample output
Note that at present, given our current knowledge, we are still not in a position to prevent
undesired inputs!
Further examples of character manipulation are available.
7. THE System.out.flush METHOD
When using System.out.print() to output
data the data is first passed to a temporary
storage area called a buffer from where it is
output to (say) the screen. This arrangement
is known as output buffering and is designed
to save processing time, however it may
cause code to appear to be behaving in a
strange manner. This is because output is
not always passed from the buffer to the
screen immediately; the Java interpreter
might process some further lines of code
before doing this. To force the buffer to be
flushed we can use the method:
System.out.flush();
contained in the PrintStream and
PrintWriter classes. For exmple we might
write:
System.out.print("Answer = ");
System.out.flush();
System.out.print(100/5);
This will cause the string "Answer = " to be
output before the calculation is undertaken.
The buffer is always flushed whenever a "new
line" character is encountered. Therefore when
using System.out.println() the above is not
a problem.
Created and maintained by Pius Nyaanga. Last updated 18 July 2013