Skip to content Skip to sidebar Skip to footer

Is It Possible To Avoid Locking Overhead When Sharing Dicts Between Threads In Python?

I have a multi-threaded application in Python where threads are reading very large (so I cannot copy them to thread-local storage) dicts (read from disk and never modified). Then t

Solution 1:

Multithreading processing in python is useless. It's better to use multiprocessing module. Because multithreading can give positive effort only in lower number of cases.

Python implementation detail: In CPython, due to the Global Interpreter Lock, only one thread can execute Python code at once (even though certain performance-oriented libraries might overcome this limitation). If you want your application to make better use of the computational resources of multi-core machines, you are advised to use multiprocessing. However, threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously. Official documentation.

Without any code examples from your side I can only recommend to split your big dictionary on several parts and process every part using Pool.map. And merge results in main process.

Unfortunately, it's impossible to share a lot of memory between different python processes effective (we are not talking about shared memory pattern based on mmap). But you can read different parts of your dictionary in different processes. Or just read entire dictionary in main process and give a small chunks to child processes.

Also, I should warn you that you should be very carefully with multiprocessing algorithms. Because every extra megabytes will be multiplied on number of process.

So, based on your pseudocode example I can assume two possible algorithm based on compute function:

# "Stateless"
for line in stdin:
    res = compute_1(line) + compute_2(line) + compute_3(line)
    print res, line

# "Shared" state
for line in stdin:
    res = compute_1(line)
    res = compute_2(line, res)
    res = compute_3(line, res)
    print res, line

In first case, you can create a several workers, read each dictionary in separate worker based on Process class (it's good idea to decrease memory usage for each process), and compute it like a production line.

In second case, you have a shared state. For each next worker you need a result of previous one. It's worst case for multithreading/multiprocessing programming. But you can write algorithm there several workers are using same Queue and pushing result to it without waiting finish of all cycle. And you just share a Queue instance between your processes.


Post a Comment for "Is It Possible To Avoid Locking Overhead When Sharing Dicts Between Threads In Python?"