peterbell10's Blog

For fork's sake - Fixing multiprocessing issues

peterbell10
Published: 07/15/2019

What did you do this week?

It was also reported that scipy.fft does not interact well with python's multiprocessing library. I was able to track down the issue and resolve it within a few hours, adding a unit test to ensure this wouldn't be broken again. However, when I went to update my uarray PR to use the new fast C++ version, the test started to fail. It turns out, the uarray multimethod objects needed to be pickleable in a very specific way that mimicks the behaviour of python Function types. Thankfully, after applying this fix the issue was resolved.

I've also this week:

  • added Hermitian symmetric transforms to scipy.fft (merged in scipy#10425)
  • added scipy.fft to the 1.4.0 release notes
  • updated my PR adding uarray support to scipy.fft
  • started work on adding a `scipy.fft` compatible interface to pyfftw in pyFFTW#269. This will be one of the first useful backend implementations for the new backend system.

What is coming up next?

The backend system PR is set to be merged very soon so I should be able to focus on adding the uarray protocol to pyfftw and possibly contributing compatible backends to other FFT libraries.

I can also work on pre-planned transforms.

Did you get stuck anywhere?

The multiprocessing incompatibilities posed an unexpected challange but I was able to work though the problems alright.

 

View Blog Post

Improving uarray performance

peterbell10
Published: 07/08/2019

What did you do this week?

I have been focused on my uarray PR (uarray#1780). uarray defines a protocol for dispatching function calls to multiple different backend implentations. In my PR, I've been reimplementing the core function dispatch mechanism in C++ using the Python C-API. This week I've moved the backend registration system in to C++ which means the protocol is now 100% C++. This has brought the overhead down from ~5 us per function call to just ~700 ns or about 10 times more than a normal python function call. This overhead was one of the main blockers for the adoption of uarray so is very nice to see it come down.
    
I've also updated the vendored version of pypocketfft in scipy#10393. This new version includes a small cache for the FFT "twiddle factors" which I helped implement. This improves benchmarks by ~20% in most cases or as much as 60% for some input sizes.

What is coming up next?

My uarray PR has already been merged over the weekend so I can update my scipy.fft code and update the benchmarks there. I also plan on using the new version of pypocketfft to add support for Hermitian FFTs (like numpy's hfft). This would make scipy.fft a complete replacement for numpy.fft's functionality.

I can also work on adding pre-planned transforms to the scipy.fft interface. This would also require pocketfft's plan cacheing much more flexible so I expect we can add user config options to our automatic plan cacheing.

Did you get stuck anywhere?

No significan blockers this week.

 

View Blog Post

scipy.fft has been merged!

peterbell10
Published: 07/01/2019

What did you do this week?

My PR for scipy.fft has been merged into SciPy's master branch and is now slated for release in version 1.4. I've been doing some follow-up PRs responding to some late feedback with follow-up PRs, mostly focused on documentation.

I've also started learning the Python C-API and written a proof of concept implementation of uarray's __ua_function__ protocol in C++. It still relies on some python code which is slowing it down but even still I've measured around a 2x reduction in function call overhead.

What is coming up next?

Now that scipy.fft has been merged into SciPy, I can open a new draft PR with the uarray code. This should allow for more feedback from the SciPy community and we can move towards getting backend support into scipy.fft.
While that is ongoing I can continue to work on my C++ implementation of uarray to move more of it into native code and bring down the overhead further.

pypocketfft has also just released an update with a new interface that should allow for a nice refactoring of the scipy.fft code.

Did you get stuck anywhere?

I spent a lot of my time trying to debug some segfaults that were happening in uarray's test suite. I assumed the issue must be in my C++ code since I was unfamiliar with the Python C-API. However, after a lot of investigation the bugs turned out to be in NumPy and Dask, not my code.

 

View Blog Post

Fourth Weekly Check-in

peterbell10
Published: 06/24/2019

What did you do this week?

My main scipy.fft PR is now very close to being merged and so I've been responding to some final review comments and making minor tweaks. Otherwise, I'm keeping the changes minimal to give it a chance to be reviewed fully.

I've also been collaborating with the author of the pypocketfft library: many of the changes I made for my PR have now been merged into the upstream repo. I've also been having discussions with him about improving the API such as adding real-to-complex inverse FFTs and a cache for FFT plans to improve performace of repeated transforms. These have already been prototyped and show promising results.

On the backend front, I've been actively contributing to uarray. I've been working with the maintainer to solidify the protocol, as well as contributing a new feature to remove default-valued positional arguments. It's currently looking like the performance overhead of uarray is the biggest hurdle so I've been benchmarking the overhead involved and discussing how to mitigate that.

What is coming up next?

I plan to rewrite the core of the uarray using the python C API to try and bring down the overhead to a more acceptable level. This would be my first time using the C API, so progress may be slow at first but since this is key to the backend system's performance, it should be well worth the effort.

Once my first PR is merged, I can also open a new PR which updates to the latest version of pypocketfft which has a reworked API and also includes many of the features I added for use in SciPy. Updating now should also make it easier to maintain in future since it will reduce the size of the patch added over the library.

Did you get stuck anywhere?

I haven't had any real blockers this week and I think good progress has been made.

 

View Blog Post

Third Weekly Check-in

peterbell10
Published: 06/17/2019

What did you do this week?

I started off looking into uarray, a library for multi-method dispatch for different backends. I created a working prototype for uarray support in scipy.fft and used this to start benchmarking the overheads involved. I've also been looking into it's design and had online discussions with one of the authors.

I also made some requested changes to my scipy.fft pull-request, including updating the tutorials and benchmark suite. This revealed some serious performance regressions for 1D FFTs, as compared with scipy.fftpack. So, I deverted my attention to resolving these issues which saw over 10x improvement for small FFTs.


What is coming up next?

I need to evaluate the uarray aproach in comparison to my custom backend solution that I implemented last week. Of particular interest is the performance overhead which will need to be compared and possibly optimised. There are also a number of design choices that differ between uarray and my custom solution which I will need to reconsider and discuss on the mailing list.

Did you get stuck anywhere?

No significant blockers, although the work on optimising for 1D FFTs did take time away from the backend work that I had planned to focusing on.

 

View Blog Post