# A Python Concurrency Tutorial

By Edward D'Souza (June 14th, 2020)

I've always been afraid of concurrency in Python. Threading isn't supposed to be a good idea because of the GIL. Multi-processing sounds complicated. In this article I look at what happpens if you try concurrency wiith a simple, computationally intensive task. I found that parallelizing tasks with multiple processes is both effective and surpisingly simple in Python.

(Note: Code requires Python 3.8)

# Sections

## Example Problem

We are going to explore concurrency in Python using this example problem:

Given a set of 2D points ({(x0, y0), (x1, y1), etc.}), find the closest pair.

This problem has a O(n*log(n)) solution, but to keep things simple, we're going to use the naive O(n^2) solution.

### Basic algorithm

Before getting into concurrency, let's start with the basic solution to this problem.

To make things more flexible later, we're going to structure it as a function that takes in two sets of points ("from" and "to") and finds the closest pair of points between the two sets. To find the closest pair within an entire set of points, we can pass in the same set of points to both arguments.

As a baseline, let's look at the how long it takes the algorithm to complete using a single thread. Note that I'm using NUM_POINTS = 10,000 for all the examples to make the problem sizable enough to measure on my machine.

In a CPU bound task, there shouldn't be any benefit to using multiple threads, because of the GIL. But we're going to do it anyway just to see what happens.

As expected, there was no speed up with multiple threads.

## Multi-process performance

Our final move is using multiple processes. The multi-threading code above can be adapted with minimal changes to make it use multiple processes instead.

### Initial Solution

Here we see the expected, ideal 2x speed up by splitting the work across two processes.

### Maxing out the parallelism

We can clean up the code a little with multiprocessing.Pool and bump the level of parallelism to match the system's CPU count.

Note that I only have 4 cores. "cpu_count" returns 8 on my system due to hyperthreading, but since I'm executing the exact same function on all processes, I probably don't see much benefit beyond 4 parallel tasks.

Given all that, reducing total execution time to ~29% of the single threaded solution is pretty good!