Download Word format - Computer Science

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Computer file wikipedia , lookup

Corecursion wikipedia , lookup

Transcript
Beginners Program Web Page Builders and Verifiers
Martha J. Kosa
Tennessee Technological University
Box 5101
Cookeville, TN 38505
+1 (931) 372-3579
www.csc.tntech.edu/~mjkosa/iticse99
1. ABSTRACT
Many students "surf the Web" in their spare
(or not so spare) time. They may see Web sites
with deep hierarchical structures that were
generated automatically.
Some word
processors can save their work in HTML, the
language understood by browsers. The
students may have their own HTML validated
at a validation site.
Although browsers,
generators, and validators can be complex
programs, beginning students can implement
some of their features to reinforce their
understanding of basic CS concepts. In this
paper, we describe a set of programming
assignments, with HTML as the unifying
theme, for a data structures class. This set was
used during the Spring 1998 semester. We
also give additional ideas for HTML-related
assignments in introductory classes.
1.1 Keywords
introductory classes, assignments, WWW
2. INTRODUCTION
Tim Berners-Lee started the Web phenomenon in 1991, and
now the Web is ubiquitous. It has dramatically changed the
way the world thinks about computers. It has precipitated a
new, evolving paradigm to help faculty teach [2]. Students
enjoy using Web browsers to investigate topics of academic
and/or personal interest, hopefully with academics having
top priority when necessary. They can access the Web at
any time and from any place, provided
that they have a viable connection to the Internet or have
Web pages stored on their local machines.
Many students want to build their own Web pages, which
are normally written in HTML (HyperText Markup
Language). They can write their own HTML code, or they
can use a program, such as Microsoft FrontPage, to
generate the code. They may also use scanners or drawing
programs to create pictures, which can be incorporated in
their Web pages. To ensure that their Web pages look
reasonably consistent when viewed in different Web
browsers, they may send their pages to a validation site,
which checks their pages for standard HTML usage. These
validators work like parsers, which the students also
encounter when compiling their class programming
assignments.
Writing pure HTML code is a form of programming;
however, it is different from the programming that students
do in their classes, whether the language they use is
procedural, object-oriented, or functional. [7] discusses a
first programming course (actually, a "preprogramming"
course) in which the students use HTML and JavaScript to
build Web pages, one at a time. JavaScript is used as the
vehicle for expressing such algorithmic concepts as
decision, repetition, and function abstraction. [5] describes
an upper-division course that presents various Web
development technologies (HTML, JavaScript, Java, CGI,
and Web databases) in a breadth-first manner.
The main focus of our work is on the data structures course,
using a traditional high-level programming language, and
we have the students write programs which verify existing
Web pages or generate sets of new Web pages as output
files. The output files then become input files to another
program, namely, a Web browser. The output files are
linked to each other via hyperlinks; thus, the students can
navigate through them. This helps illustrate further that
"running programs on other programs is an every-day
occurrence" [1]. Verifying and generating Web pages are
realistic ways to use the standard data structures that they
have learned in class.
The Web can then serve as a unifying theme for the class.
[4] discussed the analysis of Martian planetary images as a
theme for the first programming course. A well-chosen
theme can serve to maintain student interest, even in a
challenging course, because the students can see practical
uses for what they are learning and can
interconnections among seemingly unrelated areas.
infer
Our paper is organized as follows. First, we describe a set
of programming assignments, having the Web as a unifying
theme, which were used in an offering of our data structures
course during the Spring 1998 semester. Then we give
some suggestions for additional assignments for a (possibly
more advanced) data structures course. Finally, we close
with some concluding remarks.
3. DATA STRUCTURES ASSIGNMENTS
In our data structures class, we cover the traditional stacks,
queues, lists, binary search trees, and multiway B-trees in
great detail. We discuss how they are implemented,
including comparisons of alternative implementations. For
programming assignments, the students do not have to
rebuild what was discussed in class; we give them the
necessary modules. Sometimes the students may be asked
to implement a new data structure; this allows them to build
on their understanding of the data structures presented in
class. We now describe assignments that show applications
of standard data structures to the Web. We believe that our
approach to the data structures class is in general agreement
with the views advocated in SIGCSE 1998's popular panel
on the future of CS2 [6].
3.1 Stacks
This first assignment shows an application of stacks to the
Web. It consists of two parts.
The first part is as follows. The program will display how a
given HTML file would look in some form of browser. The
HTML file will consist of zero or more ordered and/or
unordered lists. When this option is selected, the program
will ask the user for two file names: an input file containing
the HTML commands, and an output file where the
formatted text will be stored. The <OL> and </OL> tags
are used to open and close ordered (numbered) lists, and the
<UL> and </UL> tags are used to open and close
unordered lists. If the input file is not valid HTML (i.e., a
list is not terminated correctly, there are too many </OL>
tags, or some tags do not match up properly), the program
will give the user an error message indicating which line of
the file caused the problem.
We now describe the second part. Given an indented input
file, the program will produce the HTML file corresponding
to the file. When this option is selected, the program will
ask the user for two file names: an input file, and an output
file where the HTML output will be stored. The first line of
the input file contains a number indicating the number of
spaces in an indentation unit.
The first part of the assignment gives the student an
appreciation of the kind of work performed by browsers
and compilers. The second part of the assignment shows
the student what is involved in producing outlines and
tables of contents.
stacks.
Both give realistic applications for
3.2 Queues
After discussing various uses and implementations of
queues, we turn our attention to the priority queue. We
discuss the operations that a priority queue should provide
to their users and what the operations need to do their jobs.
InitializePQ needs an integer indicating the maximum
number of active priorities, and initializes the priority queue
to be empty. EmptyPQ (respectively, FullPQ) returns a
Boolean indicating whether the priority queue is empty
(respectively, full). InsertPQ needs access to the item to be
inserted in the priority queue, along with an integer
indicating the priority of the item. If the priority queue is
not empty, RemovePQ removes the highest-priority item
and returns it. SizePQ returns the number of items in the
priority queue. PrintPQ displays all items in the priority
queue, from highest priority to lowest priority. We then
discuss how standard queues can be used to implement a
priority queue.
The students are given a module
implementing a standard queue, and they then use this
module to implement the operations for a priority queue.
Next, they need to test their priority queue. Here is the
description of the test program that they were asked to
write. Of course, it involves the Web in some way.
The demonstration program will use priority queues (the
abstract data type just implemented) to analyze another kind
of HTML file. The files that the program will analyze will
have lines of the following form:
<H#> text </H#>
where the # denotes a positive integer between 1 and some
known upper bound. The program will need to make sure
that the numbers in the <H#> and </H#> tags are the same
on a given line. The tags are not case-sensitive; i.e., <h2>
and <H2> are equivalent.
The number will indicate the priority of the item. In a Web
browser, the height of the text would depend on the
number; a bigger number would indicate taller text. The
program will ask the user for an input file name and an
output file name. Then it will compute the average heading
size for the file, the largest heading size, the smallest
heading size, the number of headings of each size (from
largest to smallest), and display the text, grouped by
heading sizes from largest to smallest. The headings with
the same heading size will be displayed in the order in
which they were found in the file. All output will be written
to the output file. The program is to use the priority queue
operations to complete each of these tasks.
This assignment shows an application of priority queues in
grouping.
3.3 Lists
Stacks and queues are specialized lists, where additions and
removals occur at only one end for the stack and at opposite
ends for the queue. It is not always convenient to use stacks
and queues for solving problems, so we discuss generalized
lists, where additions and removals can occur at arbitrary
places. In class, we compare two alternative
implementations, the contiguous and linked, and use a
module of the basic list operations to implement more
operations. The students now have the tools they need to
complete the next Web-related assignment, which is a
variation of the assignment using stacks. The list items
corresponding to each ordered list or unordered list must be
placed in the output file in alphabetical order. The students
can assume an upper limit on the number of different
indentation levels, and there will be at most one list at each
indentation level. This assignment could be extended to
handle an arbitrary amount of nesting. Another variation
would be to generate a new file, whenever a new list is
started, and to add a link to it in the higher-level output file.
This assignment gives the students a chance to implement a
basic sorting algorithm, without having to know how the
underlying list is implemented. They often see the
necessity of ordering, such as in maintaining student
records.
3.4 Binary Search Trees
The binary tree is typically the first example of a nonlinear
data structure that students see. They study the traditional
preorder, inorder, and postorder traversal techniques,
yielding another application of recursion. They learn how
to build binary search trees, which have the potential for
faster searching times than the exhaustive search required in
an unordered structure.
After an introduction to binary search trees and traversals,
the students can complete the following assignment.
The input to the program will be the name of a text file,
with the following record format: a line consisting of a
name of word or place followed by a line containing a
description. The program will then construct a set of Web
pages corresponding to the file.
Each student will need to build a binary search tree from
the entries to help in constructing the set of Web pages.
The Web pages will have the following format:
name of word or place centered and in bold text
the description in a paragraph
a picture corresponding to the word
a link to the parent file, if present
links to the children files, if present
Each student will also need to create a root Web page,
which has a link to the Web page corresponding to the item
stored in the root of the tree.
Although we generated test cases for the students during the
Spring 1998 semester, students could participate in
generating test cases for the program because they can use
scanners or drawing programs to produce picture files.
Perhaps they could produce a class directory.
This assignment further illustrates the power of recursion
because a small recursive function can generate many Web
pages. It also introduces the students to several new HTML
tags.
3.5 Multiway Trees
After the students learn about binary trees, they study trees
with higher branching factors if time permits. The B-tree,
in which all leaves are at the same level, is a typical
example.
When the discussion of multiway trees is complete, the
students can complete the following assignment. The
assignment is a variation of the assignment described in
Section 3.3.
The input to the program will be the name of an indented
input file. The program is to produce a set of HTML files
corresponding to the file. The number of files generated is
equal to the number of lines in the file. The first line of the
file contains a number indicating the number of spaces in an
indentation unit. The program will also tell the user how
many of the HTML files are "under construction".
Each student will need to build a multiway tree from the
entries in the file to help in constructing the set of Web
pages.
The name of the root Web page will be name.html, if
name.txt is the name of the input file. The name of each
non-root Web page file will be topic.html, where topic
corresponds to the textual information, with all spaces
removed, on the relevant line of the input file.
The Web pages will have the following format:
name of original topic centered and in bold text (not
included for the root Web page)
unordered list corresponding to any subtopics, where the
list items are in alphabetical order and include links to the
Web pages corresponding to the subtopics
an indication of "under construction" if there are no
subtopics
the parent topic and a link to its file, if present
For each HTML file generated, there is a known upper
bound on the number of links to HTML files in a list.
This assignment gives the students the chance to implement
and use a new data structure. They get practice in building
hierarchies, such as course catalogs and encyclopedic
structures.
They can again work with sorting and
appreciate the power of recursion.
4. MORE DATA STRUCTURES IDEAS
In the previous section, we described several assignments
from our Spring 1998 offering of the data structures course,
which integrate the Web with standard topics from the
course. In this section, we describe more ideas for
assignments related to sorting algorithms, trees, graphs, and
hashing.
4.1 Sorting Algorithms
Sorting algorithms are often compared and contrasted in the
data structures course. Sorting algorithms which work
based on comparing items to each other are doomed to take
(n log n) operations, where n is the number of items to be
sorted. Algorithms typically presented include bubble sort,
selection sort, insertion sort, merge sort, quick sort, tree
sort, and heap sort. In linear-time sorting algorithms,
comparisons between items are forbidden; thus, there must
be some other restrictions on the items (i.e., their values
must fall within a particular range). Radix sort is often
presented as an example of a linear-time sorting algorithm.
The popular and exhaustive Cormen, Leiserson, and Rivest
algorithms text [3] describes another linear-time sorting
algorithm, counting sort, which works using a tally to count
the number of occurrences of a particular item.
We now describe a Web-related assignment dealing with
sorting algorithms. When students use a browser to visit
Web pages, they are causing the pages to get "hits".
Systems have log files to maintain statistics on the number
of hits received. Sorting can be used to organize this
information, such as in ranking the pages at a site according
to the number of hits that they have received. The students
could write a program to read a log file and rank the pages,
from the highest number of hits to the lowest. Their
program could then produce a Web page containing the
information in graphical form. They could generate a
primitive bar chart using simple picture files. They could
also compare different sorting algorithms, doing a timing
analysis, with a graphical comparison.
4.2 AVL Trees
Standard binary search trees are sensitive to the order in
which items are added to them. If the items are already in
increasing or decreasing order when they are placed in the
tree, the tree degenerates into a linear structure, causing
inefficient searches. In AVL trees, the tree is restructured
when it starts to get too unbalanced, thus maintaining
efficient search times. Each item in the tree includes a
balance factor, which indicates the difference between the
heights of the item's left and right subtrees. For each item,
the difference between the heights of the left and right
subtrees can be no more than 1. Two kinds of rotations
(single or double) serve to restructure the tree. Many data
structures textbooks include a discussion of AVL trees.
We could modify the assignment from Section 3.4 (binary
search trees) to include the balance factors in the Web
pages that are generated. Perhaps pictures of a balance
scale could be used.
4.3 Trees
In Section 3.5 (multiway trees), we described an assignment
that produced a set of Web pages from an indented text file.
We present here another Web-related assignment dealing
with trees, in which the students analyze multiple Web
pages to produce an index for the set of Web pages,
collecting all the links in a single repository. This index
serves the purpose of a table of contents.
The input to the program would be a starting Web page. If
a given Web page has links in it, the files corresponding to
those links would be read and analyzed. The links would
be used to build a tree. After all links have been exhausted,
the tree is complete, and the process of building the index
can begin.
This assignment is similar to the directory printing example
from [1].
4.4 Graphs
The World-Wide Web can be considered to be a graph,
where the directed edges are the hyperlinks between Web
pages. A standard adjacency matrix or adjacency list can
be transformed into a set of Web pages, which the students
can navigate. The standard graph algorithms can also be
applied to the Web. The students could build a minimum
spanning tree for a set of Web pages, causing a new set of
pages to be generated with the minimum number of
hyperlinks (and associated costs) such that all pages in the
set are still reachable from each other. The students could
determine the shortest path from a given Web page to
another given Web page. They could perform depth-first
and breadth-first traversals for a set of Web pages to ensure
that all pages are visited in a systematic fashion.
4.5 Hashing
Hashing is a way to improve the efficiency of searching,
provided that the number of collisions when adding items to
the hashing table is not too high. A hash table could be
used to construct a rudimentary search engine program,
giving the students yet another application of data structures
concepts to the Web.
5. CONCLUSION
We have seen the proliferation of the World Wide Web in
the past few years. Every day, we see references to the
Web in newspaper and magazine advertisements and in
television commercials. The World Wide Web relies on
powerful hardware to send its information to home and
school computers. However, much of the information that
is sent is in the relatively simple format of HTML files,
which some other programs may have automatically
generated. Students today need to obtain a good
understanding of the Web and traditional concepts from
data structures. In this paper, we presented some ideas for
programming assignments, which use the Web to reinforce
data structures concepts. In these assignments, students
generate Web pages and/or verify the correctness of the
pages. These assignments are suitable for either open or
closed laboratory settings. The Web browser used by the
students to view their generated pages can help in the
debugging process.
6. REFERENCES
[1] Astrachan, O. Self-Reference is an Illustrative
Essential. Proceedings of the Twenty-Fifth SIGCSE
Technical Symposium on Computer Science Education
(Phoenix AZ, March 1994), ACM Press, 238-242.
[2] Boroni, C.M., Goosey, F.W., Grinder, M.T., and Ross,
R.J. A Paradigm Shift! The Internet, the Web,
Browsers, Java, and the Future of Computer Science
Education. Proceedings of the Twenty-Ninth SIGCSE
Technical Symposium on Computer Science Education
(Atlanta GA, February 1998), ACM Press, 145-152.
[3] Cormen, T.H., Leiserson, C.E., and Rivest, R.L.
Introduction to Algorithms. MIT Press, 1990.
[4] Fell, H.J. and Proulx, V.K.
Exploring Martian
Planetary Images C++ Exercises for CS1. Proceedings
of the Twenty-Eighth SIGCSE Technical Symposium
on Computer Science Education (San Jose CA,
February 1997), ACM Press, 30-34.
[5] Lim, B.B.L.
Teaching Web Development
Technologies in CS/IS Curricula. Proceedings of the
Twenty-Ninth SIGCSE Technical Symposium on
Computer Science Education (Atlanta GA, February
1998), ACM Press, 107-111.
[6] McCracken, D.D., Dale, N., Wolz, U., Berman, M.,
and Astrachan, O.
Possible Futures for CS2.
Proceedings of the Twenty-Ninth SIGCSE Technical
Symposium on Computer Science Education (Atlanta
GA, February 1998), ACM Press, 357-358.
[7] Mercuri, R., Herrmann, N., and Popyack, J. Using
HTML and JavaScript in Introductory Programming
Courses. Proceedings of the Twenty-Ninth SIGCSE
Technical Symposium on Computer Science Education
(Atlanta GA, February 1998), ACM Press, 176-180.