Download online, so feel free to them

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Combining the powerful worlds of
Python and R
Rserve + pyRserve
A network bridge from Python to the
statistics package R
pyRserve – Ralph Heinkel – EuroPython 2014 Berlin
What is R?
●
A tool for statistical data analysis
●
and graphical representation of data and results
●
Open Source
●
●
Platforms:
Linux
Windows
Mac
Huge library
of packages
on CRAN
www.r-project.org
pyRserve – Ralph Heinkel – EuroPython 2014 Berlin
Former situation
Python-Land
R-Land
Images:
www.oidspace.or.uk
www.foxserve.it/media/python.png
pyRserve – Ralph Heinkel – EuroPython 2014 Berlin
R “in a cage”
Python-Land
R-Land
rpy2
(R embedded
in Python)
pyRserve – Ralph Heinkel – EuroPython 2014 Berlin
Building the bridge part 1: Rserve
Python-Land
R-Land
Rserve
pyRserve – Ralph Heinkel – EuroPython 2014 Berlin
What is Rserve?
●
●
Rserve is a TCP/IP Server for R
developed by Simon Urbanek
Allows for multiple, simultaneous connections
from arbitrary clients over network
●
Every client connections has own namespace
●
Clients available for Java, C/C++, C#, Ruby...
●
Website: www.rforge.net/Rserve
pyRserve – Ralph Heinkel – EuroPython 2014 Berlin
Building the bridge part 2: pyRserve
Python-Land
R-Land
pyRserve
Rserve
pyRserve – Ralph Heinkel – EuroPython 2014 Berlin
What is pyRserve?
●
(pure) Python client / adapter for Rserve
●
Sends Python datatypes to R (Rserve) and back
●
Allows to evaluate arbitrary R expressions
●
Can trigger function calls from Python in R
●
●
Allows to set and get variables from the R
namespace
Allows R code to launch callbacks into Python
pyRserve – Ralph Heinkel – EuroPython 2014 Berlin
Building the bridge part 3: QAP1
Python-Land
R-Land
QAP1
pyRserve
Rserve
pyRserve – Ralph Heinkel – EuroPython 2014 Berlin
QAP1 - Quad Attributes Protocol V1
●
●
●
Message-orientierted binary protocol
–
Exchange arbitrary complex data
–
Trigger commands
Both Rserve and pyRserve have an
implementation of a parser and serializer for
QAP1
Synchronous protocol – the initiating side sends a
message and awaits a response
pyRserve – Ralph Heinkel – EuroPython 2014 Berlin
Installation
R:
●
●
Download from www.r-project.org and unpack
./configure –enable-R-shlib; make; make install
Rserve:
●
●
Download from www.rforge.net/Rserve
R CMD INSTALL Rserve_1.8.0.tar.gz
pyRserve:
●
pip install pyRserve
(also installs numpy if not yet available)
Works on Python 2.6, 2.7, 3.3, 3.4
pyRserve – Ralph Heinkel – EuroPython 2014 Berlin
Easy start ...
●
Start Rserve (including R):
R CMD Rserve
Listens to localhost:6311 by default
Requires extra config to listen to public IP addr
●
Establish connection from Python to Rserve
>>> import pyRserve
>>> conn = pyRserve.connect()
Default target is localhost:6311
To connect to Rserve on a different host/port:
>>> conn = pyRserve.connect(<host>[, <port>])
pyRserve – Ralph Heinkel – EuroPython 2014 Berlin
… easy going ...
Some connector methods and attributes:
>>> conn.port
6311
>>> conn.host
localhost
>>> conn.isClosed
False
>>> conn.close()
>>> conn.isClosed
True
pyRserve – Ralph Heinkel – EuroPython 2014 Berlin
First real steps: String evaluation
●
Simple expression in R, result returned to Python:
>>> conn.eval(“1 + 2”)
3
●
Calling functions in R, e.g. for generating an array
>>> conn.eval('c(1, 5, 7)')
array([ 1., 2.])
Note: Every call to eval() returns a result (even if a variable
is set). If that is not desired, use voidEval()
●
Setting and getting variables in/from R:
>>> conn.voidEval(“aVar <- 'abc'”)
>>> conn.eval(“aVar”)
'abc'
pyRserve – Ralph Heinkel – EuroPython 2014 Berlin
More string evaluation
●
Creating a function in R and calling it:
>>> conn.voidEval('times2<-function(x){x*2}')
>>> conn.eval('times2(3)')
6.0
●
Executing a small script in R:
>>> my_r_script = '''
squareit <- function(x)
{ x**2 }
squareit(4)
'''
>>> conn.eval(my_r_script)
16.0
pyRserve – Ralph Heinkel – EuroPython 2014 Berlin
R Namespaces
Connector conn provides two special attributes:
conn.r
→ Standard R name space
conn.ref
→ Name space for references
Properties of name spaces:
●
●
●
Separate name space for every connection
Allow to set/get variables from R and make
function calls in Pythonic way
Will be deleted after connection is closed
pyRserve – Ralph Heinkel – EuroPython 2014 Berlin
A more pythonic approach … (1)
Variables can be set and get more elegantly:
Instead of
>>> conn.voidEval('aVar <- “abc”')
better do
>>> conn.r.aVar = “abc”
Complex data types from Python can be set in R:
>>> arr = numpy.array([1, 2, 3, 4, 5])
>>> arr.shape = (3, 4)
>>> conn.r.aMatrix = arr
>>> conn.eval('dim(aMatrix)')
array([3, 4])
pyRserve – Ralph Heinkel – EuroPython 2014 Berlin
A more pythonic approach … (2)
Same for function calls, incl. positional and kw args:
>>> conn.eval('func0<-function(){"hello world"}')
>>> conn.eval('func1<-function(v) { v*2 }')
>>> conn.eval('funcKW<-function(a1=1.0, a2=4.0)'
'{ list(a1, a2) }')
>>> conn.r.func0()
"hello world"
>>> conn.r.func1(5)
10
>>> conn.r.funcKW(a2=6.0)
[1.0, 6.0]
pyRserve – Ralph Heinkel – EuroPython 2014 Berlin
Functions as arguments
In R many functions expect functions as arguments:
>>> conn.r('times2 <- function(x) { x*2 }')
>>> conn.r.sapply(array([1, 2, 3]),conn.r.times2)
array([ 2., 4., 6.])
Caveat: Don't pass Python functions as arguments!
Non-sense:
>>> def double(v):
return v*2
>>> conn.r.sapply(array([1, 2, 3]), double)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'double' is not defined
pyRserve – Ralph Heinkel – EuroPython 2014 Berlin
References to variables in R
Inefficient:
>>> conn.r.arr = numpy.array([1, 2, 3])
>>> conn.r.sapply(conn.r.arr, conn.r.times2)
Why? Call-by-value – data is sent back and forth!
>>> conn.r.arr = numpy.array([1, 2, 3])
>>> arr = conn.r.arr
>>> conn.r.sapply(arr, conn.r.times2)
Better → Use pyRserve's reference namespace
>>> conn.ref.arr
<RVarProxy to variable "arr">
>>> conn.ref.arr.value()
array([1., 2., 3.])
>>> conn.r.sapply(conn.ref.arr, conn.r.times2)
pyRserve – Ralph Heinkel – EuroPython 2014 Berlin
Out-of-bound messages … aka callbacks
●
Allows R to push messages to Python
●
These trigger pre-defined callback functions
●
Rserve needs to be started with special config file
$ cat oob.conf
oob enable
eval library(Rserve)
$ R CMD Rserve --RS-conf oob.conf
pyRserve – Ralph Heinkel – EuroPython 2014 Berlin
Basic usage of Out-of-bound messages
Setup a callback function in Python:
>>> def printoobmsg(msg, msg_code):
print 'oob:', mgs, msg_code
>>> conn.oobCallback = printoobmsg
OOBs are called from R with self.oobSend():
>>> conn.voidEval('self.oobSend("foo")')
oob: foo 0
>>> conn.voidEval('self.oobSend("bar", 11)')
oob: bar 11
pyRserve – Ralph Heinkel – EuroPython 2014 Berlin
Application of oob (1): progress report
>>> conn.voidEval('''
big_job <- function(x){
y <- long_running_func1(x)
self.oobSend('33% done')
z <- long_running_func2(y)
self.oobSend('66% done')
res <- long_running_func3(z)
self.oobSend('100% done')
res}''')
>>> def progress(msg, code): print msg
>>> conn.oobCallback = progress
>>> res = conn.r.big_job(5)
33% done
66% done
100% done
>>>
... do something with res ...
pyRserve – Ralph Heinkel – EuroPython 2014 Berlin
Appl. of oob (2): method dispatcher
C_PRINT = conn.r.C_PRINT = 0
C_ECHO = conn.r.C_ECHO = 1
C_STORE = conn.r.C_STORE = 2
store = []
functions = {
C_PRINT: pprint.pprint,
C_ECHO: lambda data: data,
C_STORE: store.append,
... }
def dispatch(msg, msg_code):
return functions[msg_code](msg)
conn.oobCallback = dispatch
conn.eval('self.oobMessage('foo', C_STORE)')
print store
['foo']
pyRserve – Ralph Heinkel – EuroPython 2014 Berlin
Discussion of this network approach
Pro
●
One single R installation necessary (server-side)
●
R packages need to be maintained at one place
●
Allows to build up a compute farm of R servers
Con
●
Possibly sending huge datasets across network
●
Security aspects (so primarily for inhouse)
pyRserve – Ralph Heinkel – EuroPython 2014 Berlin
Ralph Heinkel
[email protected]
Questions?
pypi.python.org/pypi/pyRserve
pyRserve – Ralph Heinkel – EuroPython 2014 Berlin