I've always been afraid of concurrency in Python. Threading isn't
supposed to be a good idea because of the GIL. Multi-processing sounds
complicated. In this article I look at what happpens if you try
concurrency wiith a simple, computationally intensive task. I found that
parallelizing tasks with multiple processes is both effective and
surpisingly simple in Python.
We are going to explore concurrency in Python using this example
problem:
Given a set of 2D points ({(x0, y0), (x1, y1), etc.}), find the closest
pair.
This problem has a O(n*log(n)) solution, but to keep things simple,
we're going to use the naive O(n^2) solution.
Basic algorithm
Before getting into concurrency, let's start with the basic solution to
this problem.
To make things more flexible later, we're going to structure it as a
function that takes in two sets of points ("from" and "to") and finds
the closest pair of points between the two sets. To find the closest
pair within an entire set of points, we can pass in the same set of
points to both arguments.
Single-threaded performance
As a baseline, let's look at the how long it takes the algorithm to
complete using a single thread. Note that I'm using NUM_POINTS = 10,000
for all the examples to make the problem sizable enough to measure on my
machine.
Multi-threaded performance
In a CPU bound task, there shouldn't be any benefit to using multiple
threads, because of the
GIL. But we're going to do it anyway just to see what happens.
As expected, there was no speed up with multiple threads.
Multi-process performance
Our final move is using multiple processes. The multi-threading code
above can be adapted with minimal changes to make it use multiple
processes instead.
Initial Solution
Here we see the expected, ideal 2x speed up by splitting the work across
two processes.
Maxing out the parallelism
We can clean up the code a little with multiprocessing.Pool and bump the
level of parallelism to match the system's CPU count.
Note that I only have 4 cores. "cpu_count" returns 8 on my system due to
hyperthreading, but since I'm executing the exact same function on all
processes, I probably don't see much benefit beyond 4 parallel tasks.
Given all that, reducing total execution time to ~29% of the single
threaded solution is pretty good!