Download On the Structured Data Web as infrastructure for Web Services

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Versant Object Database wikipedia , lookup

Clusterpoint wikipedia , lookup

Information privacy law wikipedia , lookup

Business intelligence wikipedia , lookup

Database model wikipedia , lookup

Semantic Web wikipedia , lookup

Transcript
On the Structured Data Web as
Infrastructure for Web Services
Douglas E. Dyer, PhD
1 July 2004
Web services, as defined by Webopedia (http://www.webopedia.com/TERM/W/Web_services.html):
The term Web services describes a standardized way of integrating Web-based
applications using the XML, SOAP, WSDL and UDDI open standards over an Internet
protocol backbone. XML is used to tag the data, SOAP is used to transfer the data,
WSDL is used for describing the services available and UDDI is used for listing what
services are available. Used primarily as a means for businesses to communicate with
each other and with clients, Web services allow organizations to communicate data
without intimate knowledge of each other's IT systems behind the firewall.
Unlike traditional client/server models, such as a Web server/Web page system, Web
services do not provide the user with a GUI. Web services instead share business logic,
data and processes through a programmatic interface across a network. The applications
interface, not the users. Developers can then add the Web service to a GUI (such as a
Web page or an executable program) to offer specific functionality to users.
Web services allow different applications from different sources to communicate with
each other without time-consuming custom coding, and because all communication is in
XML, Web services are not tied to any one operating system or programming language.
For example, Java can talk with Perl, Windows applications can talk with UNIX
applications.
Web services do not require the use of browsers or HTML.
Web services are sometimes called application services.
In other words, web services are a form of remote procedure call (RPC): clients send a request message to
a server which replies with a response message. The term, originally coined by Microsoft, reflected the
growing importance of the web and web servers in the late 1990s. The need for web services arose from
the desire of developer to integrate information and extend the paradigm of the web beyond dynamic web
pages. In addition, the more successful RPC protocols of the day were deemed too complex. William
Bordes and Johann Dumser of TechMetrix wrote in the Internet Journal in December 2000
(http://www.intranetjournal.com/articles/200012/id_12_13_00a.html)
Among the most widely used Remote Procedure Calls (RPC), we can cite Microsoft's
DCOM, or Object Management Group's Internet Inter-ORB Protocol (IIOP). But when it
comes to making services communicate via Internet, these technologies reveal their
limitations. This is mainly due to the richness of the DCOM and IIOP protocols that tend
to complexify the implementations and applications that use them (see our report:Intranet
Architectures and Performance).
One important consideration is the protocol and mechanism for transferring data. Procedure calls are
simple enough, so RPC’s complication must have to do with remoteness---the separation between client
and server. All other things being equal, the simpler the method, the better. SOAP is a protocol that
strives for simplicity, but in my opinion misses the mark by a lot. Consider Syd Egan’s article which
describes a Visual Basic example involving sales tax: http://www.vbip.com/xml/soap_syd.asp. In that
example, what is a one line call to a procedure in a local program becomes an eight line service request that
requires many additional lines of code to get POSTed and to prepare to accept the returned result. On the
server side, more lines of code are required to get things going appropriately. The Internet Journal article
by Bordes and Dumser referred to earlier also has a similar description and included the excellent graphics
used to explain SOAP messages and the processing required to POST requests and receive results.
Undoubtedly, recent advances in SOAP processing have yielded more automation, but this seems like a lot
of work just to call a remote procedure.
If the SOAP call does not work, there can be many reasons including (1) the message never left the client,
(2) the message was rejected by the server, or (3) the message was misinterpreted by the server’s logic. It’s
not clear whether the various implementations provide adequate feedback to debug messages that don’t
work, but it seems likely that the visibility needed to find the problem depends on the SOAP
implementation. Visibility is key, however, since different developers likely created the client and server
logic, and with multiple developers, it is always important to be able to isolate problems.
Although there are alternative transport mechanisms for SOAP messages including MIME and FTP, most
implementations use HTTP and a web server. Part of the complexity of implementing SOAP messaging is
coaxing the web server to transport messages. Technology for doing this is not yet widespread.
One apparent advantage of using a web server is the ability to get through the corporate firewall. This is
great until you consider that the firewall is there for a reason, security being generally desirable1. By
shoving RPC calls through port 80, firewalls must become much more sophisticated to effectively analyze
content in SOAP messages and determine if they pose a risk.
WSDL is used to describe the services provided in terms of their SOAP messages, their variables, and the
variable types. Interactions may take the prevalent form of client-request-server-response, but alternatives
are possible and WSDL can also describe these alternative behaviors.
To more easily find web services and their descriptions, UDDI is used. Actually, UDDI is also used to
record business descriptions and points of contact in addition to web services---this information is indexed
by business category and supports searches on a corporate tax number or business category. For the
purposes of this paper, we are only interested in UDDI for finding web services.
Neither WSDL nor UDDI is particularly inefficient and they are not the issue per se. Instead, this note
focuses on how the structured data web (SDW) might make it simpler and easier to share information (as
SOAP does) and also provide some services that the current web services don’t provide or provide in a
more complicated way.
Structured Data Web: Although alternative implementations are possible, the structured data web (SDW)
has its roots in relational database servers, a mature technology built for storing, updating, and serving data.
In both commercial offerings and certain free products, relational database servers normally come with
access control, transactions and other methods of ensuring database consistency, a variety of indexing
methods, efficient storage and retrieval, a simple, cross-vendor query language (SQL), and interface
libraries for programming languages. Relational databases use a connection-based protocol which is likely
to be more efficient than the connectionless protocol originally used by web servers. Database
management systems live or die commercially based on query throughput performance and thus are likely
to perform well. If the objective is to share structured data (variables and values), a database server seems
a better starting point than a web server.
A unit of information in the structured data web is called an “information element.” For a given
application, an information element is a variable name and its current value (plus metadata) in a particular
problem-solving episode. A problem-solving episode is defined in terms of the user of the application and
the problem the user is solving (which we identify with a unique integer “instance number”). Example:
Suppose the application is for travel planning and the name of the application is TravelPlanner. One
variable in the TravelPlanner application is Destination. One of the problem-solving episodes is John
Bighead’s trip to Yoyodyne, Inc on the 4th of July 2000 (which happens to be John’s 37 trip, or instance
37), we can find Destination = “Newark New Jersey2.” Metadata includes things like the type of the
variable, source of the value (which might be the user, an AI algorithm, or a helpful co-worker), the time
the value was set, and any additional information thought to be relevant. If additional attributes are needed,
they may be defined at any point, but to date these have been found sufficient.
Granted: heavy-handed attempts to ensure security have thwarted more than one innovator 
For any user of a particular application, the problem instance may often be found from examining other
variable values, in this case OrganizationVisited and DatesOfVisit. Using an instance number is easier than
defining, for every application, the variables that define instances. The price paid for using an instance
number is that two queries are generally to update and find information, the first to find the appropriate
instance number. The problem instance is not the way most people remember application data, but it is one
way they like to browse data as evidenced by forms applications which index instance data in this way.
1
2
The database schema of the SDW is predefined and trades space for simplicity and uniformity. The
attributes of information elements, identified in the paragraph above) are stored in a single database table.
This arrangement results in redundant information stored, but no joins are required. With the proper
indexes, retrieval is very fast.
To record a complete digital history of events, an identically structured table, “History,” is used to record
all changes (no tuple is ever deleted, but each time an information element changes, a tuple is added). In
the general case, for a query of any application/variable/problem-solving episode, multiple rows may be
returned. These are the variable values over time, and they may be time-ordered by sorting on the time
attribute.
Finally, there is a table for describing applications and their variables. The description table is indexed by
user, enabling different users to have different versions of applications and even variables, a method
representationally equivalent to version namespaces in SOAP. This description table is a reasonable place
to store data dictionary entries, WSDL and UDDI information, OWL ontologies, natural language
descriptions or some combination of those.
In any case, the core requirement in web services is to share information, most often in the prevalent clientrequest-server-response behavior of RPC. To do so in the SDW, a client application can include an
information element for the request and the server can include an information element for the reply. Both
applications can poll the other’s information element, noting the time attribute. In this manner, explicit
communication is possible. Alternatively, information may be communicated implicitly by any application
if another application is paying attention to changes in key variables. Sentinels and other advanced services
may be implemented in this manner.
Note that the explicit communication method described above supports visibility key to isolating problems
and especially useful when multiple developers are involved. With two queries, it’s immediately obvious
whether client or server is not living up to its contract.
The structured data web only provides a mechanism for communication. The format of the communication
may be in the SOAP (XML) format or any other, although more structure and less parsing is typically for
performance and design simplicity.
Summary
Key advantages of using the SDW as infrastructure for web services include:












use of an information server tuned for structured information, not files
potentially less parsing
reduced development and modification of servers
availability of low-cost or free relational database servers
technical maturity and practical stability in the server
high performance driven by the market and supported by the technology
access control
widespread availability of ODBC and other interface modules for many languages
atomicity of write operations (via transaction processing or alternative means)
pre-defined data structure (which may be extended as needed, when needed)
capability for asynchronous and implicit messaging
visibility into messaging
These advantages should not be taken lightly.