tirthasheshpatel's Blog

Week #2: Working on the PR

tirthasheshpatel
Published: 06/20/2021

What did you do this week?

This week was spent mostly polishing my pull request and addressing reviews. I got a few big things out of the way though. Firstly, I refactored the API to accept a single dist object containing all the required methods. Secondly, I wrote a tutorial to document the usage of the new API. And lastly, I wrote a benchmark suite to profile the setup and sampling stage of each sampler. Moreover, using a lot of help from Bas (@BvB93 on GitHub), I was able to resolve all the MyPy static typing errors and get the MyPy check passing. While adding tests for the seed parameter, I noticed that I had made a mistake in handling the old NumPy RandomState API. As I had used global variables to sample from the NumPy RNG, seeding a generator broke! This was because the underlying (global) NumPy RNG was overridden by a new RNG as soon as a new generator with a seed was created. Thankfully, I quickly found a way to avoid the use of global variables and tests started passing again. One of my mentors, Christoph, was interested in using the UNU.RAN's test suite to write strong tests in SciPy. I have started looking into its test suite and also ported a few tests but this is still a work in progress.

What is coming up next?

I have got a lot of work done on the PR and it's shaping nicely: Main components of the PR have been written; Most tests pass. With this, I hope to mark the PR as open for reviews soon. I will have to make sure that I have added sufficient tests and documentation. Also, the new code lacks comments which may give reviewers a difficult time. I aim to clean out the newly added code and write more comments to delineate certain parts that might be tricky to understand. I also need to clean up the license file. There was also interest in separating UNU.RAN in a submodule. I hope to address some of these points in the upcoming week.

Did you get stuck anywhere?

I faced a weird 32-bit Linux failure which was related to my changes. When the randint distribution is input to the DAU method, it fails with an "unknown error" in UNU.RAN. I was able to localize the error but failed to find a reason for the failure. I suspect floating-point errors but a deeper inspection needs to be done. For the time being, as this isn't inside SciPy (and also only exists on a very specific platform and an old NumPy version), I have skipped that test case. This also led to a squalid revelation: memory leaks :/ This is turning into more of a can of worms than I had initially expected. Sometimes UNU.RAN frees allocated memory after calling the error handler. But the error handler is designed to jump out of the C code and return to the Cython code where the error can be raised safely. But, then, the allocated memory is never freed leading to a memory leak. I am not sure how often this happens. But it might be something to investigate in more depth. I will see if this is substantial and look into what can be done.
View Blog Post

Week #1: Creating a (big) PR

tirthasheshpatel
Published: 06/12/2021

What did you do this week?

This week I submitted an overview of the progress on the mailing list (here) and created a pull request on SciPy (#14215). Thankfully, all the tests pass and SciPy builds with UNU.RAN on all the required platforms! I also created some flowcharts to elucidate the design of the internal API and manifest how callbacks are acquired and released. I also tried to write a higher-level API (tirthasheshpatel/scipy#8) as suggested by one of my mentors.

What is coming up next?

We have discussed quite a lot of points to keep me busy for a couple of weeks down the line :). Here it is:
  • Generate/build UNU.RAN tests and try integrating into SciPy test suite.
  • Maybe figure out a way to speed up the performance on NumPy < 1.19.
  • Write better/stronger tests.
  • Mock up API that uses same object interface i.e. bundle all functions together in a dist parameter.
  • Address code reviews on my PR.
  • Add relation [of the UNU.RAN API] to the rv_discrete and rv_continuous classes in tutorial. Add in docs that rvs of UNU.RAN methods and SciPy distributions differ.

Did you get stuck anywhere?

No blockers this week!
View Blog Post

Week #0: Polishing the Prototype

tirthasheshpatel
Published: 06/07/2021

A little introduction

Hello everyone! I am Tirth, a last year computer science undergraduate student at Nirma University in India. I have been using NumPy and SciPy since I started doing scientific computing in my first year of college. I have been contributing to SciPy since last year and hope to continue to do so :). I will we working this summer to integrate UNU.RAN library in the scipy.stats submodule. UNU.RAN is a C library for Universal Non-Uniform RANdom number generation. It has been used in the ROOT project by CERN and R bindings for the library (Runuran) have also been created. My goal would be to integrate methods for sampling from univariate continuous and discrete distributions.

What did you do this week?

I got to know my mentors, Christoph and Nicholas, in the first week of the community bonding period. Since then, we have been meeting regularly to discuss the API and have been exchanging a lot of design ideas. Over the span of last three weeks, I have been able to significantly enhance my prototype to the point where I feel confident enough to propose a PR on SciPy. I started out with tirthasheshpatel/scipy#5 on my fork which was thread-unsafe and made my way up to tirthasheshpatel/scipy#6 which seems in a very good shape. It builds with UNU.RAN on all the required platforms and tests pass with an exception of a flaky failure. During the last week, I have created this excel sheet with some information on the methods I propose to add in SciPy. It also contains the information about the methods to add and parameters to keep, etc. It will help me with coding those methods in the coming weeks and also document the decisions properly.

What is coming up next?

As the PR on my fork builds and tests pass, I hope to create a PR on SciPy by next week. I also aim to circulate a mail in the Mailing List regarding the imminent PR and try to get feedback from other devs on the design of the API. Nevertheless, the coming weeks are critical to the work I aim to finish during GSoC so I hope to get things done without much contention!

Did you get stuck anywhere?

There have not been any serious blockers during the community bonding period but the Windows CI failed due to some unrelated Pythran errors. After a few abortive attempts to resolve them, the discussion on #13717 helped me fix the failing builds. As pointed out in this comment, I was missing LLVM and MinGW binaries in the PATH which caused some weird linking problems for 64-bit builds. It was moment of relief to see builds passing on windows, since windows failures worried me the most. Hopefully, everything passes on the SciPy PR that I aim to create by the end of this week :). Fingers crossed!
View Blog Post