Multiprocessing.pool With A Function That Has Multiple Args And Kwargs
Solution 1:
Let's have a look at two part of your code.
First the sumifs
function declaration:
def sumifs(df, result_col, **kwargs):
Secondly, the call to this function with the relevant parameters.
# Those are the params
ca = read_in_table('Tab1')
keywords = {'Z': base['Consumer archetype ID']}
# This is the function call
results = pool.map(partial(sumifs, a=ca, kwargs=keywords), tasks)
Update 1:
After the original code has been edited.It look like the problem is the positional argument assignment, try to discard it.
replace the line:
results = pool.map(partial(sumifs, a=ca, kwargs=keywords), result_col)
with:
results = pool.map(partial(sumifs, ca, **keywords), result_col)
An example code:
import multiprocessing
from functools import partial
def test_func(arg1, arg2, **kwargs):
print(arg1)
print(arg2)
print(kwargs)
return arg2
if __name__ == '__main__':
list_of_args2 = [1, 2, 3]
just_a_dict = {'key1': 'Some value'}
with multiprocessing.Pool(processes=3) as pool:
results = pool.map(partial(test_func, 'This is arg1', **just_a_dict), list_of_args2)
print(results)
Will output:
This is arg1
1
{'key1': 'Some value'}
This is arg1
2
{'key1': 'Some value'}
This is arg1
2
{'key1': 'Some value'}
['1', '2', '3']
More example for how to Multiprocessing.pool with a function that has multiple args and kwargs
Update 2:
Extended example (due to comments):
I wonder however, in the same fashion, if my function had three args and kwargs, and I wanted to keep arg1, arg3 and kwargs costant, how could I pass arg2 as a list for multiprocessing? In essence, how will I inidicate multiprocessing that map(partial(test_func, 'This is arg1', 'This would be arg3', **just_a_dict), arg2) the second value in partial corresponds to arg3 and not arg2?
The Update 1 code would have change as follow:
# The function signature
def test_func(arg1, arg2, arg3, **kwargs):
# The map call
pool.map(partial(test_func, 'This is arg1', arg3='This is arg3', **just_a_dict), list_of_args2)
This can be done using the python positional and keyword assignment.
Note that the kwargs
is left aside and not assigned using a keyword despite the fact that it's located after a keyword assigned value.
More information about argument assignment differences can be found here.
Solution 2:
If there is a piece of data that is constant/fixed across all works/jobs, then it is better to "initialize" the processes in the pool with this fixed data during the creation of the pool and map over the varying data. This avoids resending of fixed data with every job request. In your case, I'd do something like the following:
df = None
kw = {}
def initialize(df_in, kw_in):
global df, kw
df, kw = df_in, kw_in
def worker(data):
# computation involving df, kw, and data
...
...
with multiprocessing.Pool(max_number_processes, intializer, (base, keywords)) as pool:
pool.map(worker, varying_data)
This gist contains a full blown example of using the initializer. This blog post explains the performance gains from using initializer.
Post a Comment for "Multiprocessing.pool With A Function That Has Multiple Args And Kwargs"