Skip to content Skip to sidebar Skip to footer

Can Rpy2 Code Be Run In Parallel?

I have some Python code that passes a data frame to R via rpy2, whereupon R processes it and I pull the resulting data.frame back to R as a PANDAS data frame via com.load_data. Th

Solution 1:

rpy works by running a Python process and an R process in parallel, and exchange information between them. It does not take into account that R calls are called in parallel using multiprocess. So in practice, each of the python processes connects to the same R process. This probably causes the issues you see.

One way to circumvent this issue is to implement the parallel processing in R, and not in Python. You then send everything at once to R, this will process it in parallel, and the result will be sent back to Python.

Solution 2:

The following (python3) code suggests that, at least if a multiprocessing.Pool is used, separate R process are being spawned for each worker process (@lgautier is this right?)

import os
import multiprocessing
import time
num_processes = 3import rpy2.robjects as robjects

deftest_r_process(pause):
    n_called = robjects.r("times.called <- times.called + 1")[0]
    r_pid = robjects.r("Sys.getpid()")[0]
    print("R process for worker {} is {}. Pausing for {} seconds.".format(
        os.getpid(), r_pid, pause))
    time.sleep(pause)
    return(r_pid, n_called)


pause_secs = [2,4,3,6,7,2,3,5,1,2,3,3]
results = {}
robjects.r("times.called <- 0")
with multiprocessing.Pool(processes=num_processes) as pool:
    for proc, n_called in pool.imap_unordered(test_r_process, pause_secs):
        results[proc]=max(n_called, results.get(proc) or0)
print("The test function should have been called {} times".format(len(pause_secs)))
for pid,called in results.items():
    print("R process {} was called {} times".format(pid,called))

On my OS X laptop results in something like

R process forworker22535 is 22535. Pausing for3 seconds.
R process forworker22533 is 22533. Pausing for2 seconds.
R process forworker22533 is 22533. Pausing for6 seconds.
R process forworker22535 is 22535. Pausing for7 seconds.
R process forworker22534 is 22534. Pausing for2 seconds.
R process forworker22534 is 22534. Pausing for3 seconds.
R process forworker22533 is 22533. Pausing for5 seconds.
R process forworker22534 is 22534. Pausing for1 seconds.
R process forworker22535 is 22535. Pausing for2 seconds.
R process forworker22534 is 22534. Pausing for3 seconds.
R process forworker22535 is 22535. Pausing for3 seconds.
The test function should have been called 12 times
R process 22533 was called 3.0 times
R process 22534 was called 5.0 times
R process 22535 was called 4.0 times

Post a Comment for "Can Rpy2 Code Be Run In Parallel?"