Post 9

Published: 07/23/2023

In the past couple of weeks, an implementation of sets was completed. This week, I worked on benchmarking this. I also started work on another implementation, and progressed on some earlier dictionary issues.

Set Benchmark (GitHub Gist)

  • This benchmarks the recent hash-set implementation that uses linear-probing for collision-resolution. We compare performance of this LPython data-structure with equivalent structures in Python (set) and C++ (unordered_set).
  • Performance of functions set.add and set.remove is tested.
    • One way to do this is to simply insert and remove elements in order (say, from 1 to 1e7). However, this would not lead to collisions, as the set would be rehashed after reaching a size threshold (in our implementation, this is 0.6 of the capacity at any point).
    • To add some more complexity, we use random numbers. To generate pseudo-random numbers without adding major performance overheads, Lehmer random number generator  is used. In particular, the tests use Schrage's method to avoid 64-bit division. This is a straightforward algorithm that uses linear modular arithmetic to generate random numbers.
  • As can be seen in the results, LPython outperforms both Python and C++. Surprisingly, C++ was slower than the equivalent Python code.
    • When dealing with integers, LPython is more than 4.5 times faster than C++, and more than 3.6 times faster than Python.
    • With strings, these gains are about 1.2 and 1.02, respectively.
  • For C++, we also tested with custom hash functions, to mimic those used in our LPython implementation. This did not affect the results much.

​​​​​​​Set (Separate Chaining) (#2198)

  • ​​​​​​​This is towards adding the separate-chaining collision-resolution technique. In contrast to linear probing, when collisions happen, we extend the pre-existing linked list of elements at that hash value. We already have it in dictionaries, but there are some issues in that implementation.
  • I created all the basic functions, and will fix remaining bugs in the coming week.

​​​​​​​Dictionary (Keys and Values, continued) (#2023)

  • ​​​​​​​I had implemented this in a previous week to work with the linear-probing implementation of dictionaries. Now, I extended these functions to also work with the separate-chaining implementation.
  • We iterate over all the hash values, and if there is a corresponding linked list, we iterate over that, putting keys/values into a list, the latter returned at the end.