Articles on kabra1110's Blog

Post 12

virendrakabra14@gmail.com (kabra1110) — Thu, 17 Aug 2023 02:24:48 +0000

Post 11

virendrakabra14@gmail.com (kabra1110) — Thu, 10 Aug 2023 03:19:39 +0000

Post 10

virendrakabra14@gmail.com (kabra1110) — Sun, 30 Jul 2023 15:17:54 +0000

In the last week, I started with another implementation of sets, that uses Separate Chaining for collision-resolution. This was the second implementation, the first being with Linear Probing. This week, I completed this implementation, fixing bugs, and identifying some more in the existing dictionary implementation.

Set (Separate Chaining, continued) (#2198)

As described in the last post, this implementation uses one linked list per hash value to allow for more than one element having the same hash.

The earlier implementation had bugs that did not correct identify duplicates in the set. In addition, remove function was not actually removing elements due to incorrect assignment of variables. I fixed these and other issues, testing all of these in the integration tests.

To complete this, I benchmarked this implementation (updated gist here). It turns out that this new implementation is not significantly better than the Linear Probing one.

Reserve function (#2232)

I will work on this issue in the next week, as it affects existing benchmarks. This basically reserves space for a list at the start of an algorithm. Equivalent functions are available in C++ with the same name

Post 9

virendrakabra14@gmail.com (kabra1110) — Sun, 23 Jul 2023 15:23:26 +0000

In the past couple of weeks, an implementation of sets was completed. This week, I worked on benchmarking this. I also started work on another implementation, and progressed on some earlier dictionary issues.

Set Benchmark (GitHub Gist)

This benchmarks the recent hash-set implementation that uses linear-probing for collision-resolution. We compare performance of this LPython data-structure with equivalent structures in Python (set) and C++ (unordered_set).
Performance of functions set.add and set.remove is tested.
- One way to do this is to simply insert and remove elements in order (say, from 1 to 1e7). However, this would not lead to collisions, as the set would be rehashed after reaching a size threshold (in our implementation, this is 0.6 of the capacity at any point).
- To add some more complexity, we use random numbers. To generate pseudo-random numbers without adding major performance overheads, Lehmer random number generator is used. In particular, the tests use Schrage's method to avoid 64-bit division. This is a straightforward algorithm that uses linear modular arithmetic to generate random numbers.
As can be seen in the results, LPython outperforms both Python and C++. Surprisingly, C++ was slower than the equivalent Python code.
- When dealing with integers, LPython is more than 4.5 times faster than C++, and more than 3.6 times faster than Python.
- With strings, these gains are about 1.2 and 1.02, respectively.
For C++, we also tested with custom hash functions, to mimic those used in our LPython implementation. This did not affect the results much.

Set (Separate Chaining) (#2198)

This is towards adding the separate-chaining collision-resolution technique. In contrast to linear probing, when collisions happen, we extend the pre-existing linked list of elements at that hash value. We already have it in dictionaries, but there are some issues in that implementation.
I created all the basic functions, and will fix remaining bugs in the coming week.

Dictionary (Keys and Values, continued) (#2023)

I had implemented this in a previous week to work with the linear-probing implementation of dictionaries. Now, I extended these functions to also work with the separate-chaining implementation.
We iterate over all the hash values, and if there is a corresponding linked list, we iterate over that, putting keys/values into a list, the latter returned at the end.

Post 8

virendrakabra14@gmail.com (kabra1110) — Sun, 16 Jul 2023 14:17:30 +0000

I worked on adding more functionality to sets, and fixing some existing issues.

- Sets (#2122, continued)

After the initial implementation was completed last week, I added functions to add and remove elements from sets.

To add an element, its hash is computed, and we find an empty slot in the underlying array. This was done over the linear-probing implementation - if the spot corresponding to this hash is non-empty, a traversal is done starting from that position. Also, if the load factor crosses a threshold, a larger array is allocated, and all of the existing data is rehashed.
To remove an element, the mask value is simply set to a special value (3). This is called a tombstone, marking deletion. At later insertions, this spot can be reused. However, care has to be taken while reading elements - this element must not be read.
During the weekly meet with my mentor, we fixed an issue that wasn't reproducible for me locally, but was showing up on the workflows. This was occurring due to non-initialization of an interface member.

- Nested Dictionaries (branch nested_dict)

There is an existing issue (#1111) with the current dictionary implementation, which causes exceptions and segfaults when using nested dictionaries like dict[i32, dict[i32, i32]]. There was also an issue with assignment of empty dictionaries, i.e., d: dict[i32, dict[i32, i32]] = {}. This is now fixed (not merged yet). For the nesting, I found that the issue is with nested DictItem ASR nodes being present, which in turn try to read items that are not yet present. It seems that the solution is to have a DictInsert node at the innermost level, while keeping other nodes as is. I will try to fix this in the coming weeks.

- Set benchmarking

For the next week, I will also work on benchmarking our set implementation against C++ STL sets. We expect to perform (at least) at par with these, but if we find issues with some types (e.g. set of strings), we will also add the separate chaining implementation, as done for dictionaries.

Post 7

virendrakabra14@gmail.com (kabra1110) — Sun, 09 Jul 2023 16:19:07 +0000

I worked on introducing sets and adding functionality to existing data structures.

- Set (#2122)

We already had initial implementation for dictionaries. Sets are similar, but unlike dictionaries, do not have key-value pairs. As with dictionaries, an abstract set interface is created, so we can use different collision-resolution techniques. There is some duplication of code such as hash computation, but for now we focus on providing basic functionality for set.

I used linear-probing collision-resolution, where we find the next empty spot starting from the hash position. While deleting an element, we mark it with a special marker (tombstone). This helps in ignoring it in further reads, and indicates that elements with the same hash can be present at later positions.

Further, I implemented functions to write items into the set. This involved resolving collisions for reading and writing, and also for rehashing the entire set while increasing the underlying list size. This technique is not effective for string elements, as in that case we just compute hashes of the character-array pointers. We will use separate chaining for this in the coming weeks.

Next, I worked on functions for deep-copying sets, and retrieving its length.

- Dictionary keys and values (#2023)

I corrected earlier functions to obtain lists of keys and values, when linear probing is used. This involved using the key-mask to identify whether elements are set or not.

Directly testing this did not work, as CPython does not return list items for these functions. Rather, it returns a view, with limited functionality; for example, we cannot index a view. Thus, I tried copying the results into a list, but it turns out that LPython does not fully support nested iterables with dictionaries yet. I will work on this in the next week.

- List comparison (#2025)

Earlier, I had implemented the less than operation for lists and tuples. I extended this to other comparison operations, and used LLVM's CreateCmp instructions that allow specifying a predicate. This helps prevent unnecessary replication of code, and we just switch the predicate using an overload_id.

---

Next week, I will continue adding more functions to sets, and try to resolve the nested dictionary issue.

Post 6

virendrakabra14@gmail.com (kabra1110) — Sun, 09 Jul 2023 15:12:43 +0000

I was out for the week of 26 June.

Post 5

virendrakabra14@gmail.com (kabra1110) — Sun, 25 Jun 2023 16:37:46 +0000

This week, I continued to improve list, tuple, and dictionary data structures.

Last week, I had worked on an improvement to support nested tuples. In some parts, I made use of raw memory allocation using new and delete. This was pointed in the PR comments.

I corrected this (PR 2018), using an allocator for memory management. This is used in general throughout the existing code as well, preventing memory leaks.

Further, I worked on adding some functionality to dictionaries. Issue 1881 is to support iteration through a dictionary as

d: dict[i32, i32]
d = {1:2, 3:4, 5:6}
for k in d:
    print(k)

This code iterates over the dictionary keys. We discussed in our meet on how to implement this, and concluded that it would be good to first implement dict.keys and dict.values as in CPython. For now, we are not using views, but rather returning lists.

I implemented some of this (PR 2023), but it turns out that the keys list has some extra values. I will discuss on this further.

Next, I also started work on supporting list comparison (Issue 1832). At the moment, we only have equality comparison. I started with implementing less than operation (PR 2025). I am trying to abstract out code for various comparison operations, so that same code is not used repeatedly.

Post 4

virendrakabra14@gmail.com (kabra1110) — Sun, 18 Jun 2023 17:37:54 +0000

I worked on improving existing data structures

1932 - Support nested tuples.
- For example, tuple[tuple[i32, f64], i32]
1940 - Support optional parameters in list index method.
- Now the method signature is list.index(x[, start[, end]]), as in CPython.

In the coming week, I will continue to add and improve other data structure methods.

Post 3

virendrakabra14@gmail.com (kabra1110) — Sun, 11 Jun 2023 18:05:35 +0000

I continued work on adding methods to existing data structures, and improving others.

Added list repeat (1888). Fixed warnings from a couple of other functions
Fixed an issue to disallow zero step in list, string slicing (1879).
Completed work for 1845 - there was an issue in popping from nested iterables

For the next week, I plan to work on known issues. This includes

Supporting nested tuples (1886)
List index with optional arguments

Post 2

virendrakabra14@gmail.com (kabra1110) — Sun, 04 Jun 2023 18:13:42 +0000

I worked on the following PRs:

Continued work on list pop (1845)
Tuple concatenation (1865)
List reverse for nested iterables (1866)
Support increment operator for dictionaries (1843)

Post 1

virendrakabra14@gmail.com (kabra1110) — Sat, 27 May 2023 19:39:05 +0000

I am working on the project - Implementing and improving advanced data structures, with mentor Gagandeep Singh.

Currently I am working on improving the list data structure by adding more function and fixing issues. Further, functions are being created using the IntrinsicFunction architecture. This helps prevent registering new nodes in the grammar for each method.

Some PRs that I have worked on

1676: Add list count
1703: Add list index
1758: Add list reverse
1805: Add list pop

I am also working on issues related to other data structures such as dictionary, tuple, etc. raised on the repository.

In the next weeks, I plan to continue working on these features, and later implement other data structures.