* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download On the Structured Data Web as infrastructure for Web Services
Survey
Document related concepts
Transcript
On the Structured Data Web as Infrastructure for Web Services Douglas E. Dyer, PhD 1 July 2004 Web services, as defined by Webopedia (http://www.webopedia.com/TERM/W/Web_services.html): The term Web services describes a standardized way of integrating Web-based applications using the XML, SOAP, WSDL and UDDI open standards over an Internet protocol backbone. XML is used to tag the data, SOAP is used to transfer the data, WSDL is used for describing the services available and UDDI is used for listing what services are available. Used primarily as a means for businesses to communicate with each other and with clients, Web services allow organizations to communicate data without intimate knowledge of each other's IT systems behind the firewall. Unlike traditional client/server models, such as a Web server/Web page system, Web services do not provide the user with a GUI. Web services instead share business logic, data and processes through a programmatic interface across a network. The applications interface, not the users. Developers can then add the Web service to a GUI (such as a Web page or an executable program) to offer specific functionality to users. Web services allow different applications from different sources to communicate with each other without time-consuming custom coding, and because all communication is in XML, Web services are not tied to any one operating system or programming language. For example, Java can talk with Perl, Windows applications can talk with UNIX applications. Web services do not require the use of browsers or HTML. Web services are sometimes called application services. In other words, web services are a form of remote procedure call (RPC): clients send a request message to a server which replies with a response message. The term, originally coined by Microsoft, reflected the growing importance of the web and web servers in the late 1990s. The need for web services arose from the desire of developer to integrate information and extend the paradigm of the web beyond dynamic web pages. In addition, the more successful RPC protocols of the day were deemed too complex. William Bordes and Johann Dumser of TechMetrix wrote in the Internet Journal in December 2000 (http://www.intranetjournal.com/articles/200012/id_12_13_00a.html) Among the most widely used Remote Procedure Calls (RPC), we can cite Microsoft's DCOM, or Object Management Group's Internet Inter-ORB Protocol (IIOP). But when it comes to making services communicate via Internet, these technologies reveal their limitations. This is mainly due to the richness of the DCOM and IIOP protocols that tend to complexify the implementations and applications that use them (see our report:Intranet Architectures and Performance). One important consideration is the protocol and mechanism for transferring data. Procedure calls are simple enough, so RPC’s complication must have to do with remoteness---the separation between client and server. All other things being equal, the simpler the method, the better. SOAP is a protocol that strives for simplicity, but in my opinion misses the mark by a lot. Consider Syd Egan’s article which describes a Visual Basic example involving sales tax: http://www.vbip.com/xml/soap_syd.asp. In that example, what is a one line call to a procedure in a local program becomes an eight line service request that requires many additional lines of code to get POSTed and to prepare to accept the returned result. On the server side, more lines of code are required to get things going appropriately. The Internet Journal article by Bordes and Dumser referred to earlier also has a similar description and included the excellent graphics used to explain SOAP messages and the processing required to POST requests and receive results. Undoubtedly, recent advances in SOAP processing have yielded more automation, but this seems like a lot of work just to call a remote procedure. If the SOAP call does not work, there can be many reasons including (1) the message never left the client, (2) the message was rejected by the server, or (3) the message was misinterpreted by the server’s logic. It’s not clear whether the various implementations provide adequate feedback to debug messages that don’t work, but it seems likely that the visibility needed to find the problem depends on the SOAP implementation. Visibility is key, however, since different developers likely created the client and server logic, and with multiple developers, it is always important to be able to isolate problems. Although there are alternative transport mechanisms for SOAP messages including MIME and FTP, most implementations use HTTP and a web server. Part of the complexity of implementing SOAP messaging is coaxing the web server to transport messages. Technology for doing this is not yet widespread. One apparent advantage of using a web server is the ability to get through the corporate firewall. This is great until you consider that the firewall is there for a reason, security being generally desirable1. By shoving RPC calls through port 80, firewalls must become much more sophisticated to effectively analyze content in SOAP messages and determine if they pose a risk. WSDL is used to describe the services provided in terms of their SOAP messages, their variables, and the variable types. Interactions may take the prevalent form of client-request-server-response, but alternatives are possible and WSDL can also describe these alternative behaviors. To more easily find web services and their descriptions, UDDI is used. Actually, UDDI is also used to record business descriptions and points of contact in addition to web services---this information is indexed by business category and supports searches on a corporate tax number or business category. For the purposes of this paper, we are only interested in UDDI for finding web services. Neither WSDL nor UDDI is particularly inefficient and they are not the issue per se. Instead, this note focuses on how the structured data web (SDW) might make it simpler and easier to share information (as SOAP does) and also provide some services that the current web services don’t provide or provide in a more complicated way. Structured Data Web: Although alternative implementations are possible, the structured data web (SDW) has its roots in relational database servers, a mature technology built for storing, updating, and serving data. In both commercial offerings and certain free products, relational database servers normally come with access control, transactions and other methods of ensuring database consistency, a variety of indexing methods, efficient storage and retrieval, a simple, cross-vendor query language (SQL), and interface libraries for programming languages. Relational databases use a connection-based protocol which is likely to be more efficient than the connectionless protocol originally used by web servers. Database management systems live or die commercially based on query throughput performance and thus are likely to perform well. If the objective is to share structured data (variables and values), a database server seems a better starting point than a web server. A unit of information in the structured data web is called an “information element.” For a given application, an information element is a variable name and its current value (plus metadata) in a particular problem-solving episode. A problem-solving episode is defined in terms of the user of the application and the problem the user is solving (which we identify with a unique integer “instance number”). Example: Suppose the application is for travel planning and the name of the application is TravelPlanner. One variable in the TravelPlanner application is Destination. One of the problem-solving episodes is John Bighead’s trip to Yoyodyne, Inc on the 4th of July 2000 (which happens to be John’s 37 trip, or instance 37), we can find Destination = “Newark New Jersey2.” Metadata includes things like the type of the variable, source of the value (which might be the user, an AI algorithm, or a helpful co-worker), the time the value was set, and any additional information thought to be relevant. If additional attributes are needed, they may be defined at any point, but to date these have been found sufficient. Granted: heavy-handed attempts to ensure security have thwarted more than one innovator For any user of a particular application, the problem instance may often be found from examining other variable values, in this case OrganizationVisited and DatesOfVisit. Using an instance number is easier than defining, for every application, the variables that define instances. The price paid for using an instance number is that two queries are generally to update and find information, the first to find the appropriate instance number. The problem instance is not the way most people remember application data, but it is one way they like to browse data as evidenced by forms applications which index instance data in this way. 1 2 The database schema of the SDW is predefined and trades space for simplicity and uniformity. The attributes of information elements, identified in the paragraph above) are stored in a single database table. This arrangement results in redundant information stored, but no joins are required. With the proper indexes, retrieval is very fast. To record a complete digital history of events, an identically structured table, “History,” is used to record all changes (no tuple is ever deleted, but each time an information element changes, a tuple is added). In the general case, for a query of any application/variable/problem-solving episode, multiple rows may be returned. These are the variable values over time, and they may be time-ordered by sorting on the time attribute. Finally, there is a table for describing applications and their variables. The description table is indexed by user, enabling different users to have different versions of applications and even variables, a method representationally equivalent to version namespaces in SOAP. This description table is a reasonable place to store data dictionary entries, WSDL and UDDI information, OWL ontologies, natural language descriptions or some combination of those. In any case, the core requirement in web services is to share information, most often in the prevalent clientrequest-server-response behavior of RPC. To do so in the SDW, a client application can include an information element for the request and the server can include an information element for the reply. Both applications can poll the other’s information element, noting the time attribute. In this manner, explicit communication is possible. Alternatively, information may be communicated implicitly by any application if another application is paying attention to changes in key variables. Sentinels and other advanced services may be implemented in this manner. Note that the explicit communication method described above supports visibility key to isolating problems and especially useful when multiple developers are involved. With two queries, it’s immediately obvious whether client or server is not living up to its contract. The structured data web only provides a mechanism for communication. The format of the communication may be in the SOAP (XML) format or any other, although more structure and less parsing is typically for performance and design simplicity. Summary Key advantages of using the SDW as infrastructure for web services include: use of an information server tuned for structured information, not files potentially less parsing reduced development and modification of servers availability of low-cost or free relational database servers technical maturity and practical stability in the server high performance driven by the market and supported by the technology access control widespread availability of ODBC and other interface modules for many languages atomicity of write operations (via transaction processing or alternative means) pre-defined data structure (which may be extended as needed, when needed) capability for asynchronous and implicit messaging visibility into messaging These advantages should not be taken lightly.