tirthasheshpatel's Blog

Week #6: More Benchmarks

tirthasheshpatel
Published: 07/18/2021

What did you do this week?

I continued writing benchmarks for UNU.RAN methods to sample from continuous distributions. The results of these benchmarks are quite promising. They outperform the default rvs method in SciPy on an average by 100x better performance. In some cases, when the PDF/CDF are expensive to evaluate, UNU.RAN methods are 10000 times faster than the default rvs method. It would a nice future project to add a specialized rvs methods for some distributions where UNU.RAN performs significantly better than SciPy. For more details and some pretty plots, please look at tirthasheshpatel#10. With the help of Ralf Gommers, UNU.RAN submodule got transfered under the SciPy organization this week. The new submodule is now present at scipy/unuran.

What is coming up next?

I am hoping to get some reviews on my PR over the period of next week. Other than that, I don't have anything specific to do this week. I will maybe benchmark UNU.RAN method to sample from discrete distributions...

Did you get stuck anywhere?

Nope. Thankfully, another smooth week.
View Blog Post

Week #5: First Phase Ends...

tirthasheshpatel
Published: 07/12/2021

What did you do this week?

The PR resolving the memory leaks is still up for reviews. While it was being reviewed, I thought it would be nice to think ahead of time and look at the potential performance of the methods I propose to add. So, I created a new branch gsoc-unuran-bench on my fork and started wrapping the remaining methods I had proposed in the excel sheet that I wrote a couple of weeks ago. I then wrote a small Python script to benchmark all the wrapped methods against NumPy random number generators. For now, I have only used two distributions: Standard Normal and Beta(2, 3). I plan to add more in the following weeks. Sampling was run 3 times per measurement. The results of the benchmark:
  • UNU.RAN's methods (namely NumericalInversePolynomial and AutomaticRatioOfUniforms) were 3x faster than the NumPy RNG for the Beta(2, 3) distribution.
  • NumPy RNG was slightly faster than UNU.RAN's methods (with NumericalInversePolynomial and AutomaticRatioOfUniforms being the closest to the performance of the NumPy RNG) to sample from the Standard Normal distribution.
It is good to see that there is a possibility of improving the performance of sampling from some distributions once the methods from UNU.RAN are integrated in SciPy.

What is coming up next?

There are already some reviews on the PR resolving memory leaks and I hope by the end of the next week, there would be even more and we could decide whether we want to use that approach in SciPy. It's a tricky and non-conventional approach so I am not sure how many reviews would be considered "enough" or how much time will it take for the maintainers to properly review it. But while that is going on, I hope to start wrapping methods to sample from discrete distributions and benchmark them against the NumPy RNG.

Did you get stuck anywhere?

No. This was more or less a smooth week...

Marking the end of Phase 1

GSoC page says that the first phase reviews would start from tomorrow. So, this seems like a good time to summarize all the progress of Phase 1 here:
  • PR filed: Tests pass.
  • SciPy builds with UNU.RAN.
  • Separated UNU.RAN in its own submodule.
  • Create wrappers for one continuous and one discrete generator.
  • Basic benchmarks written.
  • Basic tests written.
  • A strong documentation suite and tutorials written.
  • Extra benchmarks written on my fork for all continuous methods in UNU.RAN.
According to my proposal, I have successfully achieved my first and second milestone and encroached the third milestone ahead of time! Most of the points above are still under reviews and might be changed in the future if need be. It would still be a challenge, both for the maintainers and me, to resolve the memory leaks issue but I hope that is done before the end of the second phase so that we can merge and test out some of the new functionality and iterate on the design. Let's hope the best for what's coming!
View Blog Post

Week #4: Memory Leaks :(

tirthasheshpatel
Published: 07/04/2021

What did you do this week?

This week was, by far, the most challenging week for me! Turns out, resolving the memory leaks issue is not as simple as patching UNU.RAN and changing the order of frees and calls to the error handler :/ I was able to discover this because of a very helpful suggestion by Nicholas (one of my mentors) to run valgrind on my changes (see the discussion on tirthasheshpatel/unuran#1).

==25301== LEAK SUMMARY:
==25301==    definitely lost: 13,728 bytes in 58 blocks
==25301==    indirectly lost: 43,520 bytes in 387 blocks
==25301==      possibly lost: 166,439 bytes in 114 blocks
==25301==    still reachable: 3,553,979 bytes in 2,254 blocks
==25301==         suppressed: 0 bytes in 0 blocks

These memory leaks occur due to the non-local returns in the thunk and in the error handler. Unfortunately, this behavior is ingrained in the way I have designed the API to handle errors and a major refactor is required to fix the issue! tirthasheshpatel/scipy#9 is my first (and hopefully last) attempt aimed at doing this. It refactors the Cython wrapper to use the MessageStream API to handle the errors occurring in UNU.RAN and PyErr_Occurred() to detect the errors occurring in Python callbacks.

The MessageStream API was introduced and written by Pauli Virtanen (@pv on GitHub) while writing wrappers for Qhull. MessageStream uses FILE * streams to log errors occurring in the C API to a temporary file (either in memory or on disk depending on whether open_memstream is available). Once the execution of the C function is complete, one can check the file for errors and raise them in Python. One of the downsides of this approach is that UNU.RAN contains a global FILE * variable which is not thread-safe. Hence, thread-safety needs to be provided in the Cython wrapper itself which further complicates things. I have used a module-level lock which is acquired every time before calling the unur_set_stream function (which is responsible for changing the global FILE * variable) and is released once the required set of C functions have been evaluated. Finally, valgrind seems to be happy with this and reports no memory leaks!

==44175== LEAK SUMMARY:
==44175==    definitely lost: 1,128 bytes in 11 blocks
==44175==    indirectly lost: 0 bytes in 0 blocks
==44175==      possibly lost: 195,264 bytes in 124 blocks
==44175==    still reachable: 3,258,070 bytes in 2,112 blocks
==44175==         suppressed: 0 bytes in 0 blocks

(The 1128 lost bytes are inside dlopen.c library and not UNU.RAN wrappers.)

A very similar thing has been done by Pauli Virtanen in Qhull and getting his help and reviews on tirthasheshpatel/scipy#9 would be invaluable! I hope that my approach is correct this time and the whole fuss about memory leaks resolves as soon as possible.

What is coming up next?

The issue I delineate in the previous section is the bottleneck blocking #14215 from receiving more reviews and merging. I hope to continue my work on tirthasheshpatel/scipy#9 and get it approved and/or merged by the end of this or next week. I also hope this doesn't bring more surprises down the line and I can continue my work on #14215 more smoothly!

Did you get stuck anywhere?

Not yet... :)
View Blog Post

Week #3: Separating UNU.RAN in its own submodule

tirthasheshpatel
Published: 06/27/2021

What did you do this week?

This week went by addressing last week's blockers and bifurcating UNU.RAN in its own submodule. I put the UNU.RAN source code in my PR which made that patch size over 100,000 LOC making it very difficult to review. So, I spent a couple of days creating a separate repository for the UNU.RAN source code and using git submodule to clone it into SciPy. I also wrote a Python script to download and clean UNU.RAN for use in SciPy. This reduced the size of the patch from over 100,000 LOC to only about 4000 LOC. There were also some comments on the tutorial and benchmarks which I addressed this week.

What is coming up next?

Last week, I also noticed a memory leak so I decided to write a Python script to find more. The script reports 40 to 50 memory leaks throughout the source code but the part of the source code used by SciPy has only about 9 to 10 memory leaks. In the last meeting, we came up with two potential ways to get rid of them: (i) patch up UNU.RAN and reverse the order of frees and error, (ii) Instead of jumping out of the C function, use the return code of UNU.RAN functions to set and raise the error. The problem with approach (ii) is that some functions don't return an error code. So, I plan to test the first approach out this week and hopefully address all the memory leaks. We also decided to try out writing a higher-level API for the String API and get other devs opinions on it on the mailing list.

Did you get stuck anywhere?

No blockers this week!
View Blog Post

Week #2: Working on the PR

tirthasheshpatel
Published: 06/20/2021

What did you do this week?

This week was spent mostly polishing my pull request and addressing reviews. I got a few big things out of the way though. Firstly, I refactored the API to accept a single dist object containing all the required methods. Secondly, I wrote a tutorial to document the usage of the new API. And lastly, I wrote a benchmark suite to profile the setup and sampling stage of each sampler. Moreover, using a lot of help from Bas (@BvB93 on GitHub), I was able to resolve all the MyPy static typing errors and get the MyPy check passing. While adding tests for the seed parameter, I noticed that I had made a mistake in handling the old NumPy RandomState API. As I had used global variables to sample from the NumPy RNG, seeding a generator broke! This was because the underlying (global) NumPy RNG was overridden by a new RNG as soon as a new generator with a seed was created. Thankfully, I quickly found a way to avoid the use of global variables and tests started passing again. One of my mentors, Christoph, was interested in using the UNU.RAN's test suite to write strong tests in SciPy. I have started looking into its test suite and also ported a few tests but this is still a work in progress.

What is coming up next?

I have got a lot of work done on the PR and it's shaping nicely: Main components of the PR have been written; Most tests pass. With this, I hope to mark the PR as open for reviews soon. I will have to make sure that I have added sufficient tests and documentation. Also, the new code lacks comments which may give reviewers a difficult time. I aim to clean out the newly added code and write more comments to delineate certain parts that might be tricky to understand. I also need to clean up the license file. There was also interest in separating UNU.RAN in a submodule. I hope to address some of these points in the upcoming week.

Did you get stuck anywhere?

I faced a weird 32-bit Linux failure which was related to my changes. When the randint distribution is input to the DAU method, it fails with an "unknown error" in UNU.RAN. I was able to localize the error but failed to find a reason for the failure. I suspect floating-point errors but a deeper inspection needs to be done. For the time being, as this isn't inside SciPy (and also only exists on a very specific platform and an old NumPy version), I have skipped that test case. This also led to a squalid revelation: memory leaks :/ This is turning into more of a can of worms than I had initially expected. Sometimes UNU.RAN frees allocated memory after calling the error handler. But the error handler is designed to jump out of the C code and return to the Cython code where the error can be raised safely. But, then, the allocated memory is never freed leading to a memory leak. I am not sure how often this happens. But it might be something to investigate in more depth. I will see if this is substantial and look into what can be done.
View Blog Post