Download cosc2007ass2a

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Computational phylogenetics wikipedia , lookup

Diff wikipedia , lookup

Binary search algorithm wikipedia , lookup

Lempel–Ziv–Welch wikipedia , lookup

Transcript
COSC 2007 –Data Structures II
Assignment #2
Due: Feb, 15, 2010
Building a BST
The purpose of this question is to give you experience in implementing a simple spell checking
program using binary search trees. One of the most-used applications of computers today is
checking spelling. In this question, you will load a large dictionary (approximately 173,529
words) into a binary search tree. Then, you will walk through a text document, checking each
word in the document against the dictionary; if the word is present, it is spelled correctly, and if
not, indicate it as misspelled.
In more detail, you will first load the dictionary (formatted in an alphabetical order of words) into
a balanced binary search tree and display the height of the tree. Then read in input file and check
the spelling of each word in the content against the dictionary. You are required to use the Binary
Search Tree data structure to solve the problem. The SpellChecker program will ask user to
enter two pieces of information:
• the file name of the input file to be spell checked
• the file name of the dictionary file
The input file can be any text file. The format of the dictionary file is one word per line and the
words are listed in an alphabetical order. An example of the content in the
dictionary_small.txt is as follows (this is a small dictionary file for testing purpose).
accomplices
arraignments
assemblyman
awkwarder
blowsiest
communications
concessionaire
couldn't
defrauded
dumps
fifteen
honing
incorrigibility
lockout
misting
reeved
splotch
springboard
typing
woolgathering
The program should output the possible misspelled words, and count the total number of words
and as well as the number of misspelled words. Your program (SpellChecker.java) should
behave similarly as follows:
Spell Checker Application
------------------------------What is the input file for spelling check?
test_large.txt
What is the dictionary file?
dictionary_large.txt
The height of the binary search tree is 18.
Spell checking ...done
Possible misspelled words in the file:
Armstrong
France
bruntal
indiviidual
clombing
fuet
L'Alpe
d'Huez
Armstrung
menute
widdening
leed
clusest
challengor
Ivan
impesing
Armstrong
favourite
Sundayy
meake
tims
nooz
Total number of words is 97.
Possible number of misspelled words is 22.
Hints on algorithm:
• Store the dictionary into a balanced binary search tree data structure;
• Use the algorithm taught in the lectures to construct a balanced binary search tree from
the dictionary text file, where the words are sorted by their alphabetical order;
• Read in the words from the input file and check whether they are presented in the
dictionary;
• Convert the word into lowercase before perform the checking. Numerical numbers are
discarded from the spell check.
Notes:
• Your program must use the BinarySearchTree (TreeNode, BinaryTree,
BinaryTreeBasis, TreeException) class to solve the problem. The files
test_small.txt, dictionary_small.txt, test_large.txt and
dictionary_large.txt can be used for testing your program. For this question, you
should submit the SpellChecker.java class.
Hand in your source code and the output after running your program.