Week #4: Memory Leaks :(

Published: 07/04/2021

What did you do this week?

This week was, by far, the most challenging week for me! Turns out, resolving the memory leaks issue is not as simple as patching UNU.RAN and changing the order of frees and calls to the error handler :/ I was able to discover this because of a very helpful suggestion by Nicholas (one of my mentors) to run valgrind on my changes (see the discussion on tirthasheshpatel/unuran#1).

==25301== LEAK SUMMARY:
==25301==    definitely lost: 13,728 bytes in 58 blocks
==25301==    indirectly lost: 43,520 bytes in 387 blocks
==25301==      possibly lost: 166,439 bytes in 114 blocks
==25301==    still reachable: 3,553,979 bytes in 2,254 blocks
==25301==         suppressed: 0 bytes in 0 blocks

These memory leaks occur due to the non-local returns in the thunk and in the error handler. Unfortunately, this behavior is ingrained in the way I have designed the API to handle errors and a major refactor is required to fix the issue! tirthasheshpatel/scipy#9 is my first (and hopefully last) attempt aimed at doing this. It refactors the Cython wrapper to use the MessageStream API to handle the errors occurring in UNU.RAN and PyErr_Occurred() to detect the errors occurring in Python callbacks.

The MessageStream API was introduced and written by Pauli Virtanen (@pv on GitHub) while writing wrappers for Qhull. MessageStream uses FILE * streams to log errors occurring in the C API to a temporary file (either in memory or on disk depending on whether open_memstream is available). Once the execution of the C function is complete, one can check the file for errors and raise them in Python. One of the downsides of this approach is that UNU.RAN contains a global FILE * variable which is not thread-safe. Hence, thread-safety needs to be provided in the Cython wrapper itself which further complicates things. I have used a module-level lock which is acquired every time before calling the unur_set_stream function (which is responsible for changing the global FILE * variable) and is released once the required set of C functions have been evaluated. Finally, valgrind seems to be happy with this and reports no memory leaks!

==44175== LEAK SUMMARY:
==44175==    definitely lost: 1,128 bytes in 11 blocks
==44175==    indirectly lost: 0 bytes in 0 blocks
==44175==      possibly lost: 195,264 bytes in 124 blocks
==44175==    still reachable: 3,258,070 bytes in 2,112 blocks
==44175==         suppressed: 0 bytes in 0 blocks

(The 1128 lost bytes are inside dlopen.c library and not UNU.RAN wrappers.)

A very similar thing has been done by Pauli Virtanen in Qhull and getting his help and reviews on tirthasheshpatel/scipy#9 would be invaluable! I hope that my approach is correct this time and the whole fuss about memory leaks resolves as soon as possible.

What is coming up next?

The issue I delineate in the previous section is the bottleneck blocking #14215 from receiving more reviews and merging. I hope to continue my work on tirthasheshpatel/scipy#9 and get it approved and/or merged by the end of this or next week. I also hope this doesn't bring more surprises down the line and I can continue my work on #14215 more smoothly!

Did you get stuck anywhere?

Not yet... :)