Download Possible Applications

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Transcript
Page Modification Systems Survey Paper
Draft #2; August 31, 1999
Andrew Kensler
Possible Applications
Advertisements, banners and other obtrusive elements on the web often annoy
people. Blocking these is one potential application for a page modification system. The
filter may be able to simply remove the code in the HTML that invokes these, or it could
change these to something more innocuous such as a simple, unobtrusive, text hyperlink.
There are currently many systems that perform this service.
Similiarly, web filter systems may be used to attempt to block objectionable
content from the web and so protect children. These filters can deny access to certain
pages and images and may alter the text on the page to sanitize it. Some will even do
special modifications to clean the pages returned by search engine services.
More ambitiously, the filter system could support services such as annotations.
Much the same way that a person marks a paper with highlighting and notes in the
margins, these systems allow users to make notes marking up the contents of web pages.
In the case of shared annotations, other users may read these and respond. The annotation
system would rely on the page modification system to provide the infrastructure for
adding the annotations to the web pages.
As another alternative, a page modification system could support programs for
translating documents into HTML from an alternative format. For example, if the user
prefers to use the own system of markup, they could provide a list of rules for
transforming the page to HTML and the web filter would transform it on the fly. This
adds the flexibility because the style and look of innumerable pages could be changed
instantly, simply by changing the list of rules.
All of these possible applications depend on a system to filter and modify pages as
they travel from the source server to the user. These applications would benefit from a
common infrastructure which would free their programers from worrying about the
details of HTTP and network transport and allow them to concentrate on how they wish
to modify the pages.
Desired Features
Plugin architectures enhance systems by dynamically adding functionality to
programs. By developing a module that adheres to a plugin specification, developers can
extend the capabilities of a program quickly and easily. These plugins are typically
loaded by the system as it starts up.
In the case of a filter system, individual filters would be developed as plugin
modules for the system. The plugin architecture conveys several benefits. For one, it
encapsulates the filters and isolates each from the others. Multiple filters may be applied
to a page without any having to know about the others. Additionally it simplifies
development by freeing the authors from having to know the full details of the interaction
between the filter systems and the client and servers. In fact simple plugins could
potentially be totally ignorant to this. It also offers benefits to the users. Typically, they
would be able to add and use new filters simply by copying the filter plugin into the
proper directory. Overall, the plugin design affords many benefits for both the users and
the filter developers.
Transparency is still another usefull feature to have. Many people might be
intimidated by a system that exposes the mechanisms of the page modification system or
requires them to follow complicated procedures for them to use it. As much as possible,
the filter system should function behind the scenes, visible only through the filtering that
it does while they browse the Web.
Support for multiple users benefits filter authors who might want to develop
annotation systems. While annotations may be of some limited use when a single person
creates and views them, they gain significant power when shared among multiple users,
allowing users to reply and post feedback on other annotations.
To really support multiple users, a page modification system should keep track of
users and user profiles through separate accounts for each user, allowing them to log in to
use the system and receive the benefits of an individual account. The user could
customize their account, personalizing the pages for their own taste instead of being
forced into global settings that apply uniformly to all users. This would allow the user a
very personalized web browsing experience while still offering the benefits of interaction
through a shared system.
Ideally this customization could be done remotely, in case the user is one a
different machine than the one the filter system is on and does not have physical access.
They should be able to customize their account from any machine, perhaps using their
Web browser itself as an interface.
While these things benefit the user exploring the web, a page modification system
should also accomodate the site authors. Either they may wish to request certain
modifications to enhance the presentation of their page to the user or they may wish to
deny the use of certain filters or changes. For example, someone who is worried about
lewd comments in annotations might want to disable annotations for their page. Or a
page that depends on banners for a modest income might want to disable ad-blocking
filters. On the other hand a page that solicits feedback might use an annotation filter to
provide this capability and request that the users view the page with the appropriate
annotation filter. Allowing authors control over the modifications to their page also
serves to mitigate problems with intelectual property issues since authors may simply
disable all modifications if they do not want their pages changed at all.
Architecture
Several mechanisms currently exists to filter web pages as they travel between the
client and the server. One uses possible method uses a centralized CGI server, another
relies on proxy servers, and still others may use server side or client side architectures.
The CGI mechanism functions by having the browser encode the URL of the page
to filter as a CGI request. The web server running the filter passes this to the CGI filter
program which can decode the URL, fetch the page from the source server as though it
were a normal client, modify the page, and return it to the filter's web server which
returns it to the client. Typically, the filter will alter the hyperlinks on the page to redirect
them back through itself, so that any links followed will also come through the CGI filter.
Proxy servers rely on the cooperation of the client software to filter pages. The
client program, typically a web broser is configured to send all of its request for pages to
the proxy server. The proxy server relays this request to the source server and gets the
page back in response. It can then modify the page before returning it to the client. In
effect, the proxy appears as a server to the web client and as a client to the source server.
This scheme offers the advantage of being able to pass forms and CGI data to the source
server, whereas the CGI filter may potentially interfere with this. Additionally, it is fairly
transparent to the user.
While these rely on an intermediate server to handle the data, some filtering can
be done at either the client or the server side. In the case of the server, it may use CGI to
customize pages for each user. Or it may use something like Microsoft's Active Server
Pages (ASP) for the same effect. On the client side, Java applets, Java script, ActiveX
controls or browser plugins may allow for the browser itself to filter the pages for the
user. While the data downloaded by the browser would be the same, these would modify
the way that the browser renders them for the user.
Description of Existing Systems
Muffin
Mark R. Boyns developed the Muffin system as part of a thesis project at San
Diego State University. The author designed it to filter out annoying elements from
WWW pages and increase security by stripping information on the users identity, as
transmitted by the browser client.
Muffin consist of 21,000 lines of pure Java code distributed across 128 classes
and 12 interfaces. As a result, it runs on any system with a Java 1.1 runtime environment.
Muffin is available both in binary form and in source form, released under the GNU
General Public License.
Individual filters descend from a base filter class and any number of them can be
invoked to alter a page. Current filters distributed with Muffin offer the ability to block
advertisements, remove Java applets and Javascript, stop animated gifs and remove
cookies, among other things. However, Muffin may easily be extended through additional
filters written in accordance with its API.
Muffin provides a remote administration feature, whereby a user can change
global preferences while it is running through a browser interface in which the system
responds to virtual URL addresses.
However, Muffin is designed for single user use and lacks the ability to set
preferences and configuration on a user by user basis and has no user authentication
capability. Moreover, Muffin does not support group usage in any form. Additionally,
there is no way for Web page authors to control the application of plug-ins to their pages;
Muffin simply applies all of the plug-ins.
The V6 Engine
The V6 engine was designed by Bernard Lang and François to separate features
from browsers, which they feel are monolithic and rapidly growing to unwieldy sizes, and
incorporate them into a more modular system. The V6 engine forms the core to this
architecture, functioning as a proxy server with a modular filter design. The intent is to
provide an infrastructure to a variety of services.
The authors designed the V6 engine to be used in a personal context primarily by
a single user. While it can be shared among multiple users in which case the owner of the
V6 engine is the administrator, the authors intended it for a single user. A single user will
find it easier to use and can more easily configure the system and the system yields better
performance. The authors do not see a significant number of applications that require
more than a single user on the system.
The owner of a V6 system may configure it dynamically through the HTTP
protocol with a browser interface by trying to access a virtual URL. It is much like Muffin
in this respect. V6 uses a virtual hostname such as server.v6 Depending on the pathname
referenced by the URL, the system will pass control to the configuration for a particular
module. For example if the user attempts to access http://service.v6/cache/, then the
system would pass the request to the cache module to be serviced.
Currently V6 has a limited number of modules that are standard with it. One pair
works together to serve local files and CGI programs respectively and so fills the role of a
simple Web server. Other modules provide a caching system and a user authentication
system.
This system makes a distinction between filters and services. Filters perform the
function of stream transducers, modifying the information as it passes between the Web
client and the Web server. They filter all pages. In contrast the services respond only to
certain URL's, possibly virtual URL's. They provide the mechanism by which the user
configures the system.
One particularly unique feature of their system is that the system assigns stacks of
filters based on the Internet port that the V6 engine is accessed through. The stack of
filters defines the group of filters and the order that they filter the data in. Each port has
two distinct stacks attached to it. One stack is the request stack which filters the request
going from the client to the Web server. The response stack filters the Web server's reply.
Thus depending on which port the V6 proxy system is accessed through, the V6 engine
applies a separate sequence of filters.
Like Muffin, however, the V6 engine lacks a mechanism for site authors to
control the application of filters to their page.
ByProxy
ByProxy from Besiex Creations is a freeware project currently led by Benjamin
"Quincy" Cabell V. Currently the system is implemented in Java, and can run on any
platform with a Java run-time environment.
Originally he designed it to block spam from e-mail and advertisements banners from the
Web. However, the program is plug-in driven and could easily replace this behavior with
almost any other.
ByProxy divides the plug-ins into two types, "proxy agents" and "sniffers". The
documentation describes the two as follows:
As used in this program, a proxy agent is a piece of code which handles
the routine and uninteresting aspects of the conversation between two
computers. When it encounters interesting data it passes that data through
a sniffer before passing it along to the computer it was intended for. The
sniffer looks at the interesting data and can (if it wishes) change the data.
One unique feature of ByProxy is the capability to apply proxy functions to a
variety of different network transactions. While most other proxies can only handle HTTP
or one specific type of network transaction, the ByProxy system is capable of filtering
HTTP, e-mail, news groups, etc.
Unfortunately, as with the other proxy server systems, ByProxy has no support for
sharing and groups. ByProxy really only supports a single user at a time.
OreO's
Brooks et al takes a unique approach to creating a general purpose proxy. Rather
than use a plug-in system, in which a single proxy runs multiple, modular filters over the
request and response, they use a sequence of individual proxies. Thus applying multiple
filters involves stringing out proxies like a bead, one filter for each proxy. The UNIX pipe
model inspired this design.
Brooks et al call their filters OreO's in reference to the cookie, because of the
three layer structure with a "wafer" on each end and a "filling" in the midle. One end
handles communication with a client, the other end handles the server communication
and the filling does the actual work of the filter.
Their system provides the wafer parts of the OreO, leaving the developer free to
concentrate on the specific processing that their OreO must do, or the filling part. The
filling resides in a separate executable from the wafer and may be in almost any computer
language.
This works well for specific task. However, it lacks any mechanism for providing
administration and configuration services. It has no authentication system. It only
provides the communication infrastructure for the filling. They must manage these task
on their own if they wish to provide them.
Microsoft Proxy Server
The Microsoft Proxy Server is a component of the Microsoft Back Office suite for
Windows NT. It is primarily meant to be “an extensible firewall and Web cache server.”
Its goes beyond being an HTTP proxy and provides proxy and firewall services at the
packet level.
The Microsoft Proxy Server is extensible in that can use employ third party
extensions that essentially serve as plugin filters. Current extensions can scan for viruses,
filter Javascript and ActiveX data and block objectionable sites.
It also allows for remote administration over the web. However as with the other
filter systems, this is at the global level. The Microsoft Proxy Server is designed to
support multiple users and it supports accounts on a user and group basis, possibly
requiring them to log in. However control over these accounts is completely in the hands
of the administrator. Interestingly it also allows a single administrator to easily manage
multiple proxy servers.
Netscape Proxy Server
Like Microsoft’s, Netscape Proxy Server is designed to provide firewall and
caching capability. It runs on a wide variety of platforms, including Windows NT and
many flavors of UNIX. It too is extensible supporting the “Server Plug-in API,” which
allows it to use all of the extensions for other Netscape servers with the addition of proxyspecific capability. These extensions can easily filter and modify a page as it passes
through.
Similiarly, it allows an administrator to manage it remotely. The administrator
can also manage groups of proxies remotely. The Netscape Proxy Server can even e-mail
the administrator about critical events. However, this remote administration does not
extend to the individual users. They do not have the ability to manage their own
accounts.
The similiarities end there however. The system supports both users and groups,
however. The administrator can create user accounts and group accounts and place the
users into groups, using the Lightweight Directory Access Protocol (LDAP). The
administrator may set it so that users must log on to use it and the system will filter the
data appropriately for that user as dictated by the server administrator.
As with the others however, the Netscape Proxy Server does not support any form
of author control.
CGIProxy
CGIProxy is a simple system written in Perl based on the CGI technique. It is a
CGI script that runs under a webserver that supports non-parsed headers. When a link is
passed to it as part of the url, it contacts the source server, downloads the page and
modifies it before returning the page to the web server to return.
While CGIProxy is a relatively small and minimalist system (1008 lines of Perl in
version 1.1), it does support the handling of forms and cookies. It offers the capability to
block a limited number of advertisements and banners as a demonstration. However, the
code can easily be modified to do other filtering.
The user configures the system by changing variable settings in the source code.
It does not currently support external configuration. Likewise, there is no support for
multiple user counts or anything related to groups; it is just a very simple proxy server
platform.
CGIProxy is not meant to be a complete proxy in any way. Rather, the intent is to
provide a basic platform for a developer to build a more comprehensive and useful proxy
system, by adapting the coding and inserting specific functionality.
Comparison
System
Muffin
V6 Engine
ByProxy
Oreo’s
Microsoft
Proxy
Server
Netscape
Proxy
Server
CGIProxy
Proxy
Proxy
Proxy
Proxy
Plugin
Architectur
e




Proxy



Proxy



Filter
Type
CGI
User
Accounts
Group
Author
Accounts Control