Parallelizing Wheel Downloads

McSinyx
Published: 08/17/2020

And now it's clear as this promise
That we're making
Two progress bars into one

Hello there! It has been raining a lot lately and my wisdom tooth has decided to start growing today, causing me a mild fever. To whoever reading this, I hope it wouldn't happen to you.

Download Parallelization

I've been working on pip's download parallelization for quite a while now. As distribution download in pip was modeled as a lazily evaluated iterable of chunks, parallelizing such procedure is as simple as submitting routines that write files to disk to a worker pool.

Or at least that is what I thought.

Progress Reporting UI

pip is currently using customly defined progress reporting classes, which was not designed to working with multithreading code. Firstly, I want to try using these instead of defining separate UI for multithreaded progresses. As they use system signals for termination, one must the progress bars has to be running the main thread. Or sort of.

Since the progress bars are designed as iterators, I realized that we can call next on them. So quickly, I throw in some queues and locks, and prototyped the first working implementation of progress synchronization.

Performance Issues

Welp, I only said that it works, but I didn't mention the performance, which is terrible. I am pretty sure that the slow down is with the synchronization, since the map_multithread call doesn't seem to trigger anything that may introduce any sort of blocking.

This seems like a lot of fun, and I hope I'll get better tomorrow to continue playing with it!

DJDT

Versions

Time

Settings from gsoc.settings

Headers

Request

SQL queries from 1 connection

Static files (2312 found, 3 used)

Templates (11 rendered)

Cache calls from 1 backend

Signals

Log messages