Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Remote Procedure Calling Dr. Andrew C.R. Martin [email protected] http://www.bioinf.org.uk/ Aims and objectives Understand the concepts of remote procedure calling and web services To be able to describe different methods of remote procedure calls Understand the problems of ‘screen scraping’ Know how to write code using LWP and SOAP What is RPC? RPC Network Web Service A network accessible interface to application functionality using standard Internet technologies Why do RPC? distribute the load between computers access to other people's methods access to the latest data Ways of performing RPC screen scraping simple cgi scripts (REST) custom code to work across networks standardized methods (e.g. CORBA, SOAP, XML-RPC) Web services RPC methods which work across the internet are often called “Web Services” Web Services can also be self-describing (WSDL) provide methods for discovery (UDDI) Screen scraping Web service Network Web service Screen scraping Screen scraper Network Web server Extracting content from a web page Fragile procedure... Data Provider Consumer Data Extract data Web page Partial Partial Errors (errordata in Visual markup data Semantics prone) extraction data lost Extract data Fragile procedure... Trying to interpret semantics from display-based markup If the presentation changes, the screen scraper breaks Web servers… Send request for page to web server Pages Web browser Web server RDBMS CGI Script External Programs Screen scraping Straightforward in Perl Perl LWP module easy to write a web client Pattern matching and string handling routines Example scraper… A program for secondary structure prediction Want a program that: specifies an amino acid sequence provides a secondary structure prediction Example scraper... #!/usr/bin/perl -w use LWP::UserAgent; use strict; my($seq, $ss); $seq = "KVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAKFESNFNTQATNRNTDGSTDY GILQINSRWWCNDGRTPGSRNLCNIPCSALLSSDITASVNCAKKIVSDGNGMNAWVAWRNR CKGTDVQAWIRGCRL"; if(($ss = PredictSS($seq)) ne "") { print "$seq\n"; print "$ss\n"; } Example scraper… NNPREDICT web server at http://alexander.compbio.ucsf.edu/~nomi/nnpredict.html http://alexander.compbio.ucsf.edu/cgi-bin/nnpredict.pl Example scraper… Program must: connect to web server submit the sequence obtain the results and extract data Examine the source for the page… <form method="POST” action="http://alexander.compbio.ucsf.edu/cgi-bin/nnpredict.pl"> <b>Tertiary structure class:</b> <input TYPE="radio" NAME="option" VALUE="none" CHECKED> none <input TYPE="radio" NAME="option" VALUE="all-alpha"> all-alpha <input TYPE="radio" NAME="option" VALUE="all-beta"> all-beta <input TYPE="radio" NAME="option" VALUE="alpha/beta"> alpha/beta <b>Name of sequence</b> <input name="name" size="70"> <b>Sequence</b> <textarea name="text" rows=14 cols=70></textarea> </form> Example scraper… option 'none', 'all-alpha', 'all-beta', or 'alpha/beta’ name optional name for the sequence text the sequence Example server... sub PredictSS Create a LWP-based connection; { my($seq) = @_; the post $ss); request; my($url, $post, $webproxy, $ua,Create $req, $result, Connect and get the returned page # $webproxy = 'http://user:[email protected]:8080'; $webproxy = ""; $url = "http://alexander.compbio.ucsf.edu/cgi-bin/nnpredict.pl"; $post = "option=none&name=&text=$seq"; $ua = CreateUserAgent($webproxy); $req = CreatePostRequest($url, $post); $result = GetContent($ua, $req); if(defined($result)) { $ss = GetSS($result); return($ss); If behind CGI script } a firewall to access else { print STDERR "connection failed\n"; } return(""); } Values passed to CGI script <HTML><HEAD> <TITLE>NNPREDICT RESULTS</TITLE> </HEAD> <BODY bgcolor="F0F0F0"> <h1 align=center>Results of nnpredict query</h1> <p><b>Tertiary structure class:</b> alpha/beta <p><b>Sequence</b>:<br> <tt> MRSLLILVLCFLPLAALGKVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAKFESNFNTQA<br> TNRNTDGSTDYGILQINSRWWCNDGRTPGSRNLCNIPCSALLSSDITASVNCAKKIVSDG<br> NGMNAWVAWRNRCKGTDVQAWIRGCRL<br> </tt> <p><b>Secondary structure prediction <i>(H = helix, E = strand, - = no prediction)</i>:<br></b> <tt> ----EEEEEEE-H---H--EE-HHHHHHHHHH--------------HHHHHH--------<br> ------------HHHHE-------------------------------HH-----EE---<br> ---HHHHHHH--------HHHHHHH--<br> </tt> </body></html> Example server… sub GetSS { my($html) = @_; my($ss); $html =~ s/\n//g; $html =~ /^.*<tt>(.*)<\/tt>.*$/; $ss = $1; $ss =~ s/\<br\>//g; return($ss); Remove return characters Match the last <tt>...</tt> Grab the text within <tt>the tags Remove <br> tags } If authors changed presentation of results, this might break! Wrappers to LWP CreateUserAgent() CreatePostRequest() GetContent() CreateGetRequest() Pros and cons Advantages 'service provider' doesn’t do anything special Disadvantages screen scraper will break if format changes may be difficult to determine semantic content Simple CGI scripts REST: Representational State Transfer http://en.wikipedia.org/wiki/REST Simple CGI scripts Extension of screen scraping relies on service provider to provide a script designed specifically for remote access Client identical to screen scraper but guaranteed that the data will be parsable (plain text or XML) Simple CGI scripts Server's point of view provide a modified CGI script which returns plain text May be an option given to the CGI script Simple CGI scripts 'Entrez programming utilities' http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html I have provided a script you can try to extract papers from PubMed Simple CGI scripts Search using EUtils is performed in 2 stages: specified search string returns a set of PubMed Ids fetch the results for each of these PubMed IDs in turn. Custom code Custom code Generally used to distribute tasks on a local network Code is complex low-level OS calls sample on the web Custom code Link relies on IP address and a 'port’ Ports tied to a particular service port 80 : HTTP port 22 : ssh See /etc/services Custom code Generally a client/server model: server listens for messages client makes requests of the server Client request message response message Server Custom code: server Server creates a 'socket' and 'binds' it to a port Listens for a connection from clients Responds to messages received Replies with information sent back to the client Custom code: client Client creates a socket Binds it to a port and the IP address of the server Sends data to the server Waits for a response Does something with the returned data Standardized Methods Standardized methods Various methods. e.g. CORBA XML-RPC SOAP Will now concentrate on SOAP... Advantages of SOAP Application client Platform and language independent code Web service Platform and language specific code Application code Advantages of SOAP Application XML message Application Information encoded in XML Language independent All data are transmitted as simple text Advantages of SOAP HTTP post SOAP request HTTP response SOAP response Normally uses HTTP for transport Firewalls allow access to the HTTP protocol Same systems will allow SOAP access Advantages of SOAP W3C standard Libraries available for many programming languages XML encoding Which of these is correct? <phoneNumber>01234 567890</phoneNumber> <phoneNumber> <areaCode>01234</areaCode> <number>567890</number> </phoneNumber> <phoneNumber areaCode='01234' number='567890' /> <phoneNumber areaCode='01234'>567890</phoneNumber> SOAP XML encoding Defined by SOAP message data: format Defined by various transport protocols Must define a standard way of encoding Type of data being exchanged How it will be expressed in XML How the information will be exchanged SOAP messages SOAP Envelope SOAP Header (optional) Header block Header block SOAP Body Message body SOAP Envelope <s:Envelope xmlns:s=”http://www.w3.org/2001/06/soapenvelope”> <s:Header> <m:transaction xmlns:m=”soap-transaction” s:mustUnderstand=”true”> <transactionID>1234</transactionID> </m:transaction> Header block SOAP Header </s:Header> <s:Body> <n:predictSS xmlns:n=”urn:SequenceData”> <sequence id='P01234'> SARTASCWIPLKNMNTYTRSFGHSGHRPLKMNSGDGAAREST </sequence> </n:predictSS> Message body </s:Body> SOAP Body </s:Envelope> Example SOAP message Header block Specifies data must be handled as a single 'transaction’ Message body contains a sequence simply encoded in XML Perfectly legal, but more common to use special RPC encoding The RPC ideal Ideal situation: $ss = PredictSS($id, $sequence); Client request message response message Server Subroutine calls Only important factors the type of the variables the order in which they are handed to the subroutine SOAP type encoding SOAP provides standard encoding for variable types: integers floats strings arrays hashes structures … Encoded SOAP message <s:Envelope xmlns:s=”http://www.w3.org/2001/06/soap-envelope”> <s:Body> <n:predictSS xmlns:n=”urn:SequenceData”> <id xsi:type='xsd:string'> P01234 </id> <sequence xsi:type='xsd:string'> SARTASCWIPLKNMNTYTRSFGHSGHRPLKMNSGDGAAREST </sequence> </n:predictSS> </s:Body> </s:Envelope> Response <s:Envelope xmlns:s=”http://www.w3.org/2001/06/soap-envelope”> <s:Body> <n:predictSSResponse xmlns:n=”urn:SequenceData”> <ss xsi:type='xsd:string'> ---HHHHHHH-----EEEEEEEE----EEEEEEE-------</ss> </n:predictSSResponse> </s:Body> </s:Envelope> SOAP transport SOAP is a packaging protocol Layered on networking and transport SOAP doesn't care what these are SOAP transport Generally uses HTTP, but may also use: FTP, raw TCP, SMTP, POP3, Jabber, etc. HTTP is pervasive across the Internet. Request-response model of RPC matches HTTP Web service components Service Listener Web application server Service Proxy Application specific code SOAP::Lite You need to know very little of this! Simply need a good SOAP library SOAP::Lite for Perl Apache SOAP Toolkits available for many languages Java, C#, C++, C, PHP, Python, … A simple SOAP server HTTPD Service Listener Web application server Simple SOAP-specific CGI script Application code Service Proxy Application specific code SOAP-specific CGI script use SOAP::Transport::HTTP; SOAP::Transport::HTTP::CGI ->dispatch_to('/home/httpd/cgi-bin/SOAPTEST') ->handle; Directory is where application-specific code is stored Any Perl module stored there will be accessible via SOAP (Can limit to individual modules and routines) Application code Lives in a Perl module: Filename with extension .pm Starts with a package statement: package mymodule; Filename must match package name mymodule.pm Must return 1. Generally end file with 1; Application code package hello; sub sayHello { my($class, $user) = @_; return "Hello $user from the SOAP server"; } 1; Simply place this file (hello.pm) in the directory specified in the SOAP proxy A simple SOAP client #!/usr/bin/perl use SOAP::Lite; my $name = shift; print "\nCalling the SOAP server...\n"; print "The SOAP server says:\n"; $s = SOAP::Lite ->uri('urn:hello') ->proxy('http://localhost/cgi-bin/SOAPTEST.pl'); print $s->sayHello($name)->result; print "\n\n"; Calling the SOAP server... The SOAP server says: Hello Andrew from the SOAP server SOAP::Lite The code is very simple! None of the hard work to package or unpack the request in XML All the hard work is hidden… Query <s:Envelope xmlns:s=”http://schemas.xmlsoap.org/soap/envelope” xmlns:xsi=”http://www.w3.org/1999/XMLSchema-instance” xmlns:xsd=”http://www.w3.org/1999/XMLSchema”> <s:Body> <m:sayHello xmlns:m=”urn:hello”> <name xsi:type='xsd:string'>Andrew</name> </m:sayHello> </s:Body> </s:Envelope> Response <s:Envelope xmlns:s=”http://schemas.xmlsoap.org/soap/envelope” xmlns:xsi=”http://www.w3.org/1999/XMLSchema-instance” xmlns:xsd=”http://www.w3.org/1999/XMLSchema”> <s:Body> <n:sayHelloResponse xmlns:n=”urn:hello”> <return xsi:type='xsd:string'>Hello Andrew from the SOAP server</return> </n:sayHelloResponse> </s:Body> </s:Envelope> An even simpler SOAP client #!/usr/bin/perl use SOAP::Lite +autodispatch=> uri=>"urn:hello", proxy=>"http://localhost/cgi-bin/SOAPTEST.pl"; my $name = shift; print print print print "\nCalling the SOAP server...\n"; "The SOAP server says:\n"; sayHello($name); "\n\n"; Summary - RPC RPC allows access to methods and data on remote computers Four main ways of achieving this Screen scraping Special CGI scripts Custom code Standardized methods (SOAP, etc.) Summary - SOAP Platform and language independent Uses XML to wrap RPC data and requests Various transport methods generally use HTTP Good toolkits make coding VERY easy all complexity hidden Related technologies allow service discovery (UDDI) self-describing services (WSDL)