Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Combining the powerful worlds of Python and R Rserve + pyRserve A network bridge from Python to the statistics package R pyRserve – Ralph Heinkel – EuroPython 2014 Berlin What is R? ● A tool for statistical data analysis ● and graphical representation of data and results ● Open Source ● ● Platforms: Linux Windows Mac Huge library of packages on CRAN www.r-project.org pyRserve – Ralph Heinkel – EuroPython 2014 Berlin Former situation Python-Land R-Land Images: www.oidspace.or.uk www.foxserve.it/media/python.png pyRserve – Ralph Heinkel – EuroPython 2014 Berlin R “in a cage” Python-Land R-Land rpy2 (R embedded in Python) pyRserve – Ralph Heinkel – EuroPython 2014 Berlin Building the bridge part 1: Rserve Python-Land R-Land Rserve pyRserve – Ralph Heinkel – EuroPython 2014 Berlin What is Rserve? ● ● Rserve is a TCP/IP Server for R developed by Simon Urbanek Allows for multiple, simultaneous connections from arbitrary clients over network ● Every client connections has own namespace ● Clients available for Java, C/C++, C#, Ruby... ● Website: www.rforge.net/Rserve pyRserve – Ralph Heinkel – EuroPython 2014 Berlin Building the bridge part 2: pyRserve Python-Land R-Land pyRserve Rserve pyRserve – Ralph Heinkel – EuroPython 2014 Berlin What is pyRserve? ● (pure) Python client / adapter for Rserve ● Sends Python datatypes to R (Rserve) and back ● Allows to evaluate arbitrary R expressions ● Can trigger function calls from Python in R ● ● Allows to set and get variables from the R namespace Allows R code to launch callbacks into Python pyRserve – Ralph Heinkel – EuroPython 2014 Berlin Building the bridge part 3: QAP1 Python-Land R-Land QAP1 pyRserve Rserve pyRserve – Ralph Heinkel – EuroPython 2014 Berlin QAP1 - Quad Attributes Protocol V1 ● ● ● Message-orientierted binary protocol – Exchange arbitrary complex data – Trigger commands Both Rserve and pyRserve have an implementation of a parser and serializer for QAP1 Synchronous protocol – the initiating side sends a message and awaits a response pyRserve – Ralph Heinkel – EuroPython 2014 Berlin Installation R: ● ● Download from www.r-project.org and unpack ./configure –enable-R-shlib; make; make install Rserve: ● ● Download from www.rforge.net/Rserve R CMD INSTALL Rserve_1.8.0.tar.gz pyRserve: ● pip install pyRserve (also installs numpy if not yet available) Works on Python 2.6, 2.7, 3.3, 3.4 pyRserve – Ralph Heinkel – EuroPython 2014 Berlin Easy start ... ● Start Rserve (including R): R CMD Rserve Listens to localhost:6311 by default Requires extra config to listen to public IP addr ● Establish connection from Python to Rserve >>> import pyRserve >>> conn = pyRserve.connect() Default target is localhost:6311 To connect to Rserve on a different host/port: >>> conn = pyRserve.connect(<host>[, <port>]) pyRserve – Ralph Heinkel – EuroPython 2014 Berlin … easy going ... Some connector methods and attributes: >>> conn.port 6311 >>> conn.host localhost >>> conn.isClosed False >>> conn.close() >>> conn.isClosed True pyRserve – Ralph Heinkel – EuroPython 2014 Berlin First real steps: String evaluation ● Simple expression in R, result returned to Python: >>> conn.eval(“1 + 2”) 3 ● Calling functions in R, e.g. for generating an array >>> conn.eval('c(1, 5, 7)') array([ 1., 2.]) Note: Every call to eval() returns a result (even if a variable is set). If that is not desired, use voidEval() ● Setting and getting variables in/from R: >>> conn.voidEval(“aVar <- 'abc'”) >>> conn.eval(“aVar”) 'abc' pyRserve – Ralph Heinkel – EuroPython 2014 Berlin More string evaluation ● Creating a function in R and calling it: >>> conn.voidEval('times2<-function(x){x*2}') >>> conn.eval('times2(3)') 6.0 ● Executing a small script in R: >>> my_r_script = ''' squareit <- function(x) { x**2 } squareit(4) ''' >>> conn.eval(my_r_script) 16.0 pyRserve – Ralph Heinkel – EuroPython 2014 Berlin R Namespaces Connector conn provides two special attributes: conn.r → Standard R name space conn.ref → Name space for references Properties of name spaces: ● ● ● Separate name space for every connection Allow to set/get variables from R and make function calls in Pythonic way Will be deleted after connection is closed pyRserve – Ralph Heinkel – EuroPython 2014 Berlin A more pythonic approach … (1) Variables can be set and get more elegantly: Instead of >>> conn.voidEval('aVar <- “abc”') better do >>> conn.r.aVar = “abc” Complex data types from Python can be set in R: >>> arr = numpy.array([1, 2, 3, 4, 5]) >>> arr.shape = (3, 4) >>> conn.r.aMatrix = arr >>> conn.eval('dim(aMatrix)') array([3, 4]) pyRserve – Ralph Heinkel – EuroPython 2014 Berlin A more pythonic approach … (2) Same for function calls, incl. positional and kw args: >>> conn.eval('func0<-function(){"hello world"}') >>> conn.eval('func1<-function(v) { v*2 }') >>> conn.eval('funcKW<-function(a1=1.0, a2=4.0)' '{ list(a1, a2) }') >>> conn.r.func0() "hello world" >>> conn.r.func1(5) 10 >>> conn.r.funcKW(a2=6.0) [1.0, 6.0] pyRserve – Ralph Heinkel – EuroPython 2014 Berlin Functions as arguments In R many functions expect functions as arguments: >>> conn.r('times2 <- function(x) { x*2 }') >>> conn.r.sapply(array([1, 2, 3]),conn.r.times2) array([ 2., 4., 6.]) Caveat: Don't pass Python functions as arguments! Non-sense: >>> def double(v): return v*2 >>> conn.r.sapply(array([1, 2, 3]), double) Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'double' is not defined pyRserve – Ralph Heinkel – EuroPython 2014 Berlin References to variables in R Inefficient: >>> conn.r.arr = numpy.array([1, 2, 3]) >>> conn.r.sapply(conn.r.arr, conn.r.times2) Why? Call-by-value – data is sent back and forth! >>> conn.r.arr = numpy.array([1, 2, 3]) >>> arr = conn.r.arr >>> conn.r.sapply(arr, conn.r.times2) Better → Use pyRserve's reference namespace >>> conn.ref.arr <RVarProxy to variable "arr"> >>> conn.ref.arr.value() array([1., 2., 3.]) >>> conn.r.sapply(conn.ref.arr, conn.r.times2) pyRserve – Ralph Heinkel – EuroPython 2014 Berlin Out-of-bound messages … aka callbacks ● Allows R to push messages to Python ● These trigger pre-defined callback functions ● Rserve needs to be started with special config file $ cat oob.conf oob enable eval library(Rserve) $ R CMD Rserve --RS-conf oob.conf pyRserve – Ralph Heinkel – EuroPython 2014 Berlin Basic usage of Out-of-bound messages Setup a callback function in Python: >>> def printoobmsg(msg, msg_code): print 'oob:', mgs, msg_code >>> conn.oobCallback = printoobmsg OOBs are called from R with self.oobSend(): >>> conn.voidEval('self.oobSend("foo")') oob: foo 0 >>> conn.voidEval('self.oobSend("bar", 11)') oob: bar 11 pyRserve – Ralph Heinkel – EuroPython 2014 Berlin Application of oob (1): progress report >>> conn.voidEval(''' big_job <- function(x){ y <- long_running_func1(x) self.oobSend('33% done') z <- long_running_func2(y) self.oobSend('66% done') res <- long_running_func3(z) self.oobSend('100% done') res}''') >>> def progress(msg, code): print msg >>> conn.oobCallback = progress >>> res = conn.r.big_job(5) 33% done 66% done 100% done >>> ... do something with res ... pyRserve – Ralph Heinkel – EuroPython 2014 Berlin Appl. of oob (2): method dispatcher C_PRINT = conn.r.C_PRINT = 0 C_ECHO = conn.r.C_ECHO = 1 C_STORE = conn.r.C_STORE = 2 store = [] functions = { C_PRINT: pprint.pprint, C_ECHO: lambda data: data, C_STORE: store.append, ... } def dispatch(msg, msg_code): return functions[msg_code](msg) conn.oobCallback = dispatch conn.eval('self.oobMessage('foo', C_STORE)') print store ['foo'] pyRserve – Ralph Heinkel – EuroPython 2014 Berlin Discussion of this network approach Pro ● One single R installation necessary (server-side) ● R packages need to be maintained at one place ● Allows to build up a compute farm of R servers Con ● Possibly sending huge datasets across network ● Security aspects (so primarily for inhouse) pyRserve – Ralph Heinkel – EuroPython 2014 Berlin Ralph Heinkel [email protected] Questions? pypi.python.org/pypi/pyRserve pyRserve – Ralph Heinkel – EuroPython 2014 Berlin