Download AM403 - Bioinformatics 1

Document related concepts

URL redirection wikipedia , lookup

Transcript
Remote Procedure Calling
Dr. Andrew C.R. Martin
[email protected]
http://www.bioinf.org.uk/
Aims and objectives
Understand the concepts of remote
procedure calling and web services
To be able to describe different methods
of remote procedure calls
Understand the problems of ‘screen
scraping’
Know how to write code using LWP
and SOAP
What is RPC?
RPC
Network
Web
Service
A network accessible interface to application
functionality using standard Internet technologies
Why do RPC?
distribute the load between computers
access to other people's methods
access to the latest data
Ways of performing RPC
screen scraping
simple cgi scripts (REST)
custom code to work across networks
standardized methods
(e.g. CORBA, SOAP, XML-RPC)
Web services
RPC methods which work across the
internet are often called
“Web Services”
Web Services can also
be self-describing (WSDL)
provide methods for discovery (UDDI)
Screen scraping
Web service
Network
Web
service
Screen scraping
Screen
scraper
Network
Web
server
Extracting content from a web page
Fragile procedure...
Data
Provider
Consumer
Data
Extract
data
Web
page
Partial
Partial
Errors
(errordata
in
Visual markup
data
Semantics
prone)
extraction
data
lost
Extract
data
Fragile procedure...
Trying to interpret semantics
from display-based markup
If the presentation changes,
the screen scraper breaks
Web servers…
Send request for page
to web server
Pages
Web browser
Web server
RDBMS
CGI
Script
External
Programs
Screen scraping
Straightforward in Perl
Perl LWP module
easy to write a web client
Pattern matching and
string handling routines
Example scraper…
A program for
secondary structure prediction
Want a program that:
specifies an amino acid sequence
provides a secondary structure prediction
Example scraper...
#!/usr/bin/perl -w
use LWP::UserAgent;
use strict;
my($seq, $ss);
$seq = "KVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAKFESNFNTQATNRNTDGSTDY
GILQINSRWWCNDGRTPGSRNLCNIPCSALLSSDITASVNCAKKIVSDGNGMNAWVAWRNR
CKGTDVQAWIRGCRL";
if(($ss = PredictSS($seq)) ne "")
{
print "$seq\n";
print "$ss\n";
}
Example scraper…
NNPREDICT web server at
http://alexander.compbio.ucsf.edu/~nomi/nnpredict.html
http://alexander.compbio.ucsf.edu/cgi-bin/nnpredict.pl
Example scraper…
Program must:
connect to web server
submit the sequence
obtain the results and extract data
Examine the source for the page…
<form method="POST”
action="http://alexander.compbio.ucsf.edu/cgi-bin/nnpredict.pl">
<b>Tertiary structure class:</b>
<input TYPE="radio" NAME="option"
VALUE="none" CHECKED> none
<input TYPE="radio" NAME="option"
VALUE="all-alpha"> all-alpha
<input TYPE="radio" NAME="option"
VALUE="all-beta"> all-beta
<input TYPE="radio" NAME="option"
VALUE="alpha/beta"> alpha/beta
<b>Name of sequence</b>
<input name="name" size="70">
<b>Sequence</b>
<textarea name="text" rows=14 cols=70></textarea>
</form>
Example scraper…
option
'none', 'all-alpha', 'all-beta', or 'alpha/beta’
name
optional name for the sequence
text
the sequence
Example server...
sub PredictSS
Create a LWP-based connection;
{
my($seq) = @_;
the post $ss);
request;
my($url, $post, $webproxy, $ua,Create
$req, $result,
Connect and get the returned page
# $webproxy
= 'http://user:[email protected]:8080';
$webproxy = "";
$url
= "http://alexander.compbio.ucsf.edu/cgi-bin/nnpredict.pl";
$post
= "option=none&name=&text=$seq";
$ua = CreateUserAgent($webproxy);
$req = CreatePostRequest($url, $post);
$result = GetContent($ua, $req);
if(defined($result))
{
$ss = GetSS($result);
return($ss);
If behind
CGI script
}
a firewall
to access
else
{
print STDERR "connection failed\n";
}
return("");
}
Values passed
to CGI script
<HTML><HEAD>
<TITLE>NNPREDICT RESULTS</TITLE>
</HEAD>
<BODY bgcolor="F0F0F0">
<h1 align=center>Results of nnpredict query</h1>
<p><b>Tertiary structure class:</b> alpha/beta
<p><b>Sequence</b>:<br>
<tt>
MRSLLILVLCFLPLAALGKVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAKFESNFNTQA<br>
TNRNTDGSTDYGILQINSRWWCNDGRTPGSRNLCNIPCSALLSSDITASVNCAKKIVSDG<br>
NGMNAWVAWRNRCKGTDVQAWIRGCRL<br>
</tt>
<p><b>Secondary structure prediction
<i>(H = helix, E = strand, - = no prediction)</i>:<br></b>
<tt>
----EEEEEEE-H---H--EE-HHHHHHHHHH--------------HHHHHH--------<br>
------------HHHHE-------------------------------HH-----EE---<br>
---HHHHHHH--------HHHHHHH--<br>
</tt>
</body></html>
Example server…
sub GetSS
{
my($html) = @_;
my($ss);
$html =~ s/\n//g;
$html =~ /^.*<tt>(.*)<\/tt>.*$/;
$ss = $1;
$ss =~ s/\<br\>//g;
return($ss);
Remove return
characters
Match
the last
<tt>...</tt>
Grab the text
within
<tt>the
tags
Remove
<br> tags
}
If authors changed presentation
of results, this might break!
Wrappers to LWP
CreateUserAgent()
CreatePostRequest()
GetContent()
CreateGetRequest()
Pros and cons
Advantages
'service provider' doesn’t do anything
special
Disadvantages
screen scraper will break if format changes
may be difficult to determine semantic
content
Simple CGI scripts
REST:
Representational State
Transfer
http://en.wikipedia.org/wiki/REST
Simple CGI scripts
Extension of screen scraping
relies on service provider to provide a script
designed specifically for remote access
Client identical to screen scraper
but guaranteed that the data will be
parsable (plain text or XML)
Simple CGI scripts
Server's point of view
provide a modified CGI script which returns
plain text
May be an option given to the CGI script
Simple CGI scripts
'Entrez programming utilities'
http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html
I have provided a script you can try to
extract papers from PubMed
Simple CGI scripts
Search using EUtils is performed in
2 stages:
specified search string returns a set of
PubMed Ids
fetch the results for each of these PubMed
IDs in turn.
Custom code
Custom code
Generally used to distribute tasks on a
local network
Code is complex
low-level OS calls
sample on the web
Custom code
Link relies on IP address and a 'port’
Ports tied to a particular service
port 80 : HTTP
port 22 : ssh
See /etc/services
Custom code
Generally a client/server model:
server listens for messages
client makes requests of the server
Client
request message
response message
Server
Custom code: server
Server creates a 'socket' and 'binds' it
to a port
Listens for a connection from clients
Responds to messages received
Replies with information sent back to
the client
Custom code: client
Client creates a socket
Binds it to a port and the IP address
of the server
Sends data to the server
Waits for a response
Does something with the returned data
Standardized
Methods
Standardized methods
Various methods. e.g.
CORBA
XML-RPC
SOAP
Will now concentrate on SOAP...
Advantages of SOAP
Application
client
Platform and
language
independent
code
Web
service
Platform and
language
specific code
Application
code
Advantages of SOAP
Application
XML message
Application
Information encoded in XML
Language independent
All data are transmitted as simple text
Advantages of SOAP
HTTP post  SOAP request
HTTP response  SOAP response
Normally uses HTTP for transport
Firewalls allow access to the HTTP protocol
Same systems will allow SOAP access
Advantages of SOAP
W3C standard
Libraries available for many
programming languages
XML encoding
Which of these is correct?
<phoneNumber>01234 567890</phoneNumber>
<phoneNumber>
<areaCode>01234</areaCode>
<number>567890</number>
</phoneNumber>
<phoneNumber areaCode='01234' number='567890' />
<phoneNumber areaCode='01234'>567890</phoneNumber>
SOAP XML encoding
Defined by
SOAP message
data: format
Defined by
various transport
protocols
Must define a standard way of encoding
Type of data being exchanged
How it will be expressed in XML
How the information will be exchanged
SOAP messages
SOAP Envelope
SOAP Header (optional)
Header block
Header block
SOAP Body
Message body
SOAP Envelope
<s:Envelope xmlns:s=”http://www.w3.org/2001/06/soapenvelope”>
<s:Header>
<m:transaction xmlns:m=”soap-transaction”
s:mustUnderstand=”true”>
<transactionID>1234</transactionID>
</m:transaction>
Header block
SOAP Header
</s:Header>
<s:Body>
<n:predictSS xmlns:n=”urn:SequenceData”>
<sequence id='P01234'>
SARTASCWIPLKNMNTYTRSFGHSGHRPLKMNSGDGAAREST
</sequence>
</n:predictSS>
Message body
</s:Body>
SOAP Body
</s:Envelope>
Example SOAP message
Header block
Specifies data must be handled as a single
'transaction’
Message body
contains a sequence simply encoded in XML
Perfectly legal, but more common to use
special RPC encoding
The RPC ideal
Ideal situation:
$ss = PredictSS($id, $sequence);
Client
request message
response message
Server
Subroutine calls
Only important factors
the type of the variables
the order in which they are handed to the
subroutine
SOAP type encoding
SOAP provides standard encoding for
variable types:
integers
floats
strings
arrays
hashes
structures
…
Encoded SOAP message
<s:Envelope
xmlns:s=”http://www.w3.org/2001/06/soap-envelope”>
<s:Body>
<n:predictSS xmlns:n=”urn:SequenceData”>
<id xsi:type='xsd:string'>
P01234
</id>
<sequence xsi:type='xsd:string'>
SARTASCWIPLKNMNTYTRSFGHSGHRPLKMNSGDGAAREST
</sequence>
</n:predictSS>
</s:Body>
</s:Envelope>
Response
<s:Envelope
xmlns:s=”http://www.w3.org/2001/06/soap-envelope”>
<s:Body>
<n:predictSSResponse
xmlns:n=”urn:SequenceData”>
<ss xsi:type='xsd:string'>
---HHHHHHH-----EEEEEEEE----EEEEEEE-------</ss>
</n:predictSSResponse>
</s:Body>
</s:Envelope>
SOAP transport
SOAP is a packaging protocol
Layered on networking and transport
SOAP doesn't care what these are
SOAP transport
Generally uses HTTP, but may also use:
FTP, raw TCP, SMTP, POP3, Jabber, etc.
HTTP is pervasive across the Internet.
Request-response model of RPC
matches HTTP
Web service components
Service
Listener
Web application server
Service
Proxy
Application
specific
code
SOAP::Lite
You need to know very little of this!
Simply need a good SOAP library
SOAP::Lite for Perl
Apache SOAP
Toolkits available for many languages
Java, C#, C++, C, PHP, Python, …
A simple SOAP server
HTTPD
Service
Listener
Web application server
Simple
SOAP-specific
CGI script
Application
code
Service
Proxy
Application
specific
code
SOAP-specific CGI script
use SOAP::Transport::HTTP;
SOAP::Transport::HTTP::CGI
->dispatch_to('/home/httpd/cgi-bin/SOAPTEST')
->handle;
Directory is where application-specific
code is stored
Any Perl module stored there will be accessible via SOAP
(Can limit to individual modules and routines)
Application code
Lives in a Perl module:
Filename with extension .pm
Starts with a package statement:
package mymodule;
Filename must match package name
mymodule.pm
Must return 1. Generally end file with
1;
Application code
package hello;
sub sayHello {
my($class, $user) = @_;
return "Hello $user from the SOAP server";
}
1;
Simply place this file (hello.pm)
in the directory specified in the
SOAP proxy
A simple SOAP client
#!/usr/bin/perl
use SOAP::Lite;
my $name = shift;
print "\nCalling the SOAP server...\n";
print "The SOAP server says:\n";
$s = SOAP::Lite
->uri('urn:hello')
->proxy('http://localhost/cgi-bin/SOAPTEST.pl');
print $s->sayHello($name)->result;
print "\n\n";
Calling the SOAP server...
The SOAP server says:
Hello Andrew from the SOAP server
SOAP::Lite
The code is very simple!
None of the hard work to package or
unpack the request in XML
All the hard work is hidden…
Query
<s:Envelope
xmlns:s=”http://schemas.xmlsoap.org/soap/envelope”
xmlns:xsi=”http://www.w3.org/1999/XMLSchema-instance”
xmlns:xsd=”http://www.w3.org/1999/XMLSchema”>
<s:Body>
<m:sayHello xmlns:m=”urn:hello”>
<name xsi:type='xsd:string'>Andrew</name>
</m:sayHello>
</s:Body>
</s:Envelope>
Response
<s:Envelope
xmlns:s=”http://schemas.xmlsoap.org/soap/envelope”
xmlns:xsi=”http://www.w3.org/1999/XMLSchema-instance”
xmlns:xsd=”http://www.w3.org/1999/XMLSchema”>
<s:Body>
<n:sayHelloResponse xmlns:n=”urn:hello”>
<return xsi:type='xsd:string'>Hello Andrew from
the SOAP server</return>
</n:sayHelloResponse>
</s:Body>
</s:Envelope>
An even simpler SOAP client
#!/usr/bin/perl
use SOAP::Lite +autodispatch=>
uri=>"urn:hello",
proxy=>"http://localhost/cgi-bin/SOAPTEST.pl";
my $name = shift;
print
print
print
print
"\nCalling the SOAP server...\n";
"The SOAP server says:\n";
sayHello($name);
"\n\n";
Summary - RPC
RPC allows access to methods and data
on remote computers
Four main ways of achieving this
Screen scraping
Special CGI scripts
Custom code
Standardized methods (SOAP, etc.)
Summary - SOAP
Platform and language independent
Uses XML to wrap RPC data and requests
Various transport methods
generally use HTTP
Good toolkits make coding VERY easy
all complexity hidden
Related technologies allow
service discovery (UDDI)
self-describing services (WSDL)