Articles on kabra1110's Bloghttps://blogs.python-gsoc.orgUpdates on different articles published on kabra1110's BlogenThu, 17 Aug 2023 02:24:48 +0000Post 12https://blogs.python-gsoc.org/en/kabra1110s-blog/post-12/<p><a href="https://virendrakabra14.github.io/posts/gsoc/lpython/post-12/">Link</a></p>virendrakabra14@gmail.com (kabra1110)Thu, 17 Aug 2023 02:24:48 +0000https://blogs.python-gsoc.org/en/kabra1110s-blog/post-12/Post 11https://blogs.python-gsoc.org/en/kabra1110s-blog/post-11/<p><span style="font-size: 20px;"><a href="https://virendrakabra14.github.io/posts/gsoc/lpython/post-11/">Link</a></span></p>virendrakabra14@gmail.com (kabra1110)Thu, 10 Aug 2023 03:19:39 +0000https://blogs.python-gsoc.org/en/kabra1110s-blog/post-11/Post 10https://blogs.python-gsoc.org/en/kabra1110s-blog/post-10/<p><span style="font-size: 20px;">In the last week, I started with another implementation of sets, that uses Separate Chaining for collision-resolution. This was the second implementation, the first being with Linear Probing. This week, I completed this implementation, fixing bugs, and identifying some more in the existing dictionary implementation.</span></p> <p><strong><span style="font-size: 20px;">Set (Separate Chaining, continued) (#2198)</span></strong></p> <p><span style="font-size: 20px;">As described in the last post, this implementation uses one linked list per hash value to allow for more than one element having the same hash.</span></p> <p><span style="font-size: 20px;">The earlier implementation had bugs that did not correct identify duplicates in the set. In addition, remove function was not actually removing elements due to incorrect assignment of variables. I fixed these and other issues, testing all of these in the integration tests.</span></p> <p><span style="font-size: 20px;">To complete this, I benchmarked this implementation (updated gist <a href="https://gist.github.com/kabra1110/fe86159c81a7cfed7c4f52e11e7fa3c6">here</a>). It turns out that this new implementation is not significantly better than the Linear Probing one.</span></p> <p><span style="font-size: 20px;"><strong>Reserve function (#2232)</strong></span></p> <p><span style="font-size: 20px;">I will work on this issue in the next week, as it affects existing benchmarks. This basically reserves space for a list at the start of an algorithm. Equivalent functions are available in C++ with the same name</span></p>virendrakabra14@gmail.com (kabra1110)Sun, 30 Jul 2023 15:17:54 +0000https://blogs.python-gsoc.org/en/kabra1110s-blog/post-10/Post 9https://blogs.python-gsoc.org/en/kabra1110s-blog/post-9/<p><span style="font-size: 20px;">In the past couple of weeks, an implementation of sets was completed. This week, I worked on benchmarking this. I also started work on another implementation, and progressed on some earlier dictionary issues.</span></p> <p><strong><span style="font-size: 20px;">Set Benchmark (<a href="https://gist.github.com/kabra1110/fe86159c81a7cfed7c4f52e11e7fa3c6">GitHub Gist</a>)</span></strong></p> <ul> <li><span style="font-size: 20px;">This benchmarks the recent hash-set implementation that uses linear-probing for collision-resolution. We compare performance of this LPython data-structure with equivalent structures in Python (<span style="font-family: Courier New,Courier,monospace;">set</span>) and C++ (<span style="font-family: Courier New,Courier,monospace;">unordered_set</span>).</span></li> <li><span style="font-size: 20px;">Performance of functions <span style="font-family: Courier New,Courier,monospace;">set.add</span> and <span style="font-family: Courier New,Courier,monospace;">set.remove</span> is tested.</span> <ul> <li><span style="font-size: 20px;">One way to do this is to simply insert and remove elements in order (say, from 1 to 1e7). However, this would not lead to collisions, as the set would be rehashed after reaching a size threshold (in our implementation, this is 0.6 of the capacity at any point).</span></li> <li><span style="font-size: 20px;">To add some more complexity, we use random numbers. To generate pseudo-random numbers without adding major performance overheads, <a href="https://en.wikipedia.org/wiki/Lehmer_random_number_generator">Lehmer random number generator</a>  is used. In particular, the tests use <a href="https://en.wikipedia.org/wiki/Lehmer_random_number_generator#Schrage's_method">Schrage's method</a> to avoid 64-bit division. This is a straightforward algorithm that uses linear modular arithmetic to generate random numbers.</span></li> </ul> </li> <li><span style="font-size: 20px;">As can be seen in the results, LPython outperforms both Python and C++. Surprisingly, C++ was slower than the equivalent Python code.</span> <ul> <li><span style="font-size: 20px;">When dealing with integers, LPython is more than 4.5 times faster than C++, and more than 3.6 times faster than Python.</span></li> <li><span style="font-size: 20px;">With strings, these gains are about 1.2 and 1.02, respectively.</span></li> </ul> </li> <li><span style="font-size: 20px;">For C++, we also tested with custom hash functions, to mimic those used in our LPython implementation. This did not affect the results much.</span></li> </ul> <p><span style="font-size: 20px;">​​​​​​​<strong>Set (Separate Chaining) (#2198)</strong></span></p> <ul> <li><span style="font-size: 20px;">​​​​​​​This is towards adding the separate-chaining collision-resolution technique. In contrast to linear probing, when collisions happen, we extend the pre-existing linked list of elements at that hash value. We already have it in dictionaries, but there are some issues in that implementation.</span></li> <li><span style="font-size: 20px;">I created all the basic functions, and will fix remaining bugs in the coming week.</span></li> </ul> <p><span style="font-size: 20px;">​​​​​​​<strong>Dictionary (Keys and Values, continued) (#2023)</strong></span></p> <ul> <li><span style="font-size: 20px;">​​​​​​​I had implemented this in a previous week to work with the linear-probing implementation of dictionaries. Now, I extended these functions to also work with the separate-chaining implementation.</span></li> <li><span style="font-size: 20px;">We iterate over all the hash values, and if there is a corresponding linked list, we iterate over that, putting keys/values into a list, the latter returned at the end.</span></li> </ul>virendrakabra14@gmail.com (kabra1110)Sun, 23 Jul 2023 15:23:26 +0000https://blogs.python-gsoc.org/en/kabra1110s-blog/post-9/Post 8https://blogs.python-gsoc.org/en/kabra1110s-blog/post-8/<p><span style="font-size: 20px;">I worked on adding more functionality to sets, and fixing some existing issues.</span></p> <p><span style="font-size: 20px;">- Sets (#2122, continued)</span></p> <p><span style="font-size: 20px;">After the initial implementation was completed last week, I added functions to add and remove elements from sets.</span></p> <ul> <li><span style="font-size: 20px;">To add an element, its hash is computed, and we find an empty slot in the underlying array. This was done over the linear-probing implementation - if the spot corresponding to this hash is non-empty, a traversal is done starting from that position. Also, if the load factor crosses a threshold, a larger array is allocated, and all of the existing data is rehashed.</span></li> <li><span style="font-size: 20px;">To remove an element, the mask value is simply set to a special value (3). This is called a tombstone, marking deletion. At later insertions, this spot can be reused. However, care has to be taken while reading elements - this element must not be read.</span></li> <li><span style="font-size: 20px;">During the weekly meet with my mentor, we fixed an issue that wasn't reproducible for me locally, but was showing up on the workflows. This was occurring due to non-initialization of an interface member.</span></li> </ul> <p><span style="font-size: 20px;">- Nested Dictionaries (branch <span style="font-family: Courier New,Courier,monospace;">nested_dict</span>)</span></p> <p><span style="font-size: 20px;">There is an existing issue (#1111) with the current dictionary implementation, which causes exceptions and segfaults when using nested dictionaries like <span style="font-family: Courier New,Courier,monospace;">dict[i32, dict[i32, i32]]</span>. There was also an issue with assignment of empty dictionaries, i.e., <span style="font-family: Courier New,Courier,monospace;">d: dict[i32, dict[i32, i32]] = {}</span>. This is now fixed (not merged yet). For the nesting, I found that the issue is with nested <span style="font-family: Courier New,Courier,monospace;">DictItem</span> ASR nodes being present, which in turn try to read items that are not yet present. It seems that the solution is to have a <span style="font-family: Courier New,Courier,monospace;">DictInsert</span> node at the innermost level, while keeping other nodes as is. I will try to fix this in the coming weeks.</span></p> <p><span style="font-size: 20px;">- Set benchmarking</span></p> <p><span style="font-size: 20px;">For the next week, I will also work on benchmarking our set implementation against C++ STL sets. We expect to perform (at least) at par with these, but if we find issues with some types (e.g. set of strings), we will also add the separate chaining implementation, as done for dictionaries.</span></p>virendrakabra14@gmail.com (kabra1110)Sun, 16 Jul 2023 14:17:30 +0000https://blogs.python-gsoc.org/en/kabra1110s-blog/post-8/Post 7https://blogs.python-gsoc.org/en/kabra1110s-blog/post-7/<p><span style="font-size: 20px;">I worked on introducing sets and adding functionality to existing data structures.</span></p> <p><span style="font-size: 20px;">- Set (#2122)</span></p> <p><span style="font-size: 20px;">We already had initial implementation for dictionaries. Sets are similar, but unlike dictionaries, do not have key-value pairs. As with dictionaries, an abstract set interface is created, so we can use different collision-resolution techniques. There is some duplication of code such as hash computation, but for now we focus on providing basic functionality for set.</span></p> <p><span style="font-size: 20px;">I used linear-probing collision-resolution, where we find the next empty spot starting from the hash position. While deleting an element, we mark it with a special marker (tombstone). This helps in ignoring it in further reads, and indicates that elements with the same hash can be present at later positions.</span></p> <p><span style="font-size: 20px;">Further, I implemented functions to write items into the set. This involved resolving collisions for reading and writing, and also for rehashing the entire set while increasing the underlying list size. This technique is not effective for string elements, as in that case we just compute hashes of the character-array pointers. We will use separate chaining for this in the coming weeks.</span></p> <p><span style="font-size: 20px;">Next, I worked on functions for deep-copying sets, and retrieving its length.</span></p> <p><span style="font-size: 20px;">- Dictionary keys and values (#2023)</span></p> <p><span style="font-size: 20px;">I corrected earlier functions to obtain lists of keys and values, when linear probing is used. This involved using the key-mask to identify whether elements are set or not.</span></p> <p><span style="font-size: 20px;">Directly testing this did not work, as CPython does not return list items for these functions. Rather, it returns a view, with limited functionality; for example, we cannot index a view. Thus, I tried copying the results into a list, but it turns out that LPython does not fully support nested iterables with dictionaries yet. I will work on this in the next week.</span></p> <p><span style="font-size: 20px;">- List comparison (#2025)</span></p> <p><span style="font-size: 20px;">Earlier, I had implemented the less than operation for lists and tuples. I extended this to other comparison operations, and used LLVM's CreateCmp instructions that allow specifying a predicate. This helps prevent unnecessary replication of code, and we just switch the predicate using an overload_id.</span></p> <p><span style="font-size: 20px;">---</span></p> <p><span style="font-size: 20px;">Next week, I will continue adding more functions to sets, and try to resolve the nested dictionary issue.</span></p>virendrakabra14@gmail.com (kabra1110)Sun, 09 Jul 2023 16:19:07 +0000https://blogs.python-gsoc.org/en/kabra1110s-blog/post-7/Post 6https://blogs.python-gsoc.org/en/kabra1110s-blog/post-6/<p><span style="font-size: 20px;">I was out for the week of 26 June.</span></p>virendrakabra14@gmail.com (kabra1110)Sun, 09 Jul 2023 15:12:43 +0000https://blogs.python-gsoc.org/en/kabra1110s-blog/post-6/Post 5https://blogs.python-gsoc.org/en/kabra1110s-blog/post-5/<p><span style="font-size: 20px;">This week, I continued to improve list, tuple, and dictionary data structures.</span></p> <p><span style="font-size: 20px;">Last week, I had worked on an improvement to support nested tuples. In some parts, I made use of raw memory allocation using <span style="font-family: Courier New,Courier,monospace;">new</span> and <span style="font-family: Courier New,Courier,monospace;">delete</span>. This was pointed in the PR comments.</span></p> <p><span style="font-size: 20px;">I corrected this (PR 2018), using an allocator for memory management. This is used in general throughout the existing code as well, preventing memory leaks.</span></p> <p><span style="font-size: 20px;">Further, I worked on adding some functionality to dictionaries. Issue 1881 is to support iteration through a dictionary as</span></p> <pre><code class="language-python">d: dict[i32, i32] d = {1:2, 3:4, 5:6} for k in d: print(k)</code></pre> <p><span style="font-size: 20px;">This code iterates over the dictionary keys. We discussed in our meet on how to implement this, and concluded that it would be good to first implement <span style="font-family: Courier New,Courier,monospace;">dict.keys</span> and <span style="font-family: Courier New,Courier,monospace;">dict.values</span> as in CPython. For now, we are not using views, but rather returning lists.</span></p> <p><span style="font-size: 20px;">I implemented some of this (PR 2023), but it turns out that the keys list has some extra values. I will discuss on this further.</span></p> <p><span style="font-size: 20px;">Next, I also started work on supporting list comparison (Issue 1832). At the moment, we only have equality comparison. I started with implementing less than operation (PR 2025). I am trying to abstract out code for various comparison operations, so that same code is not used repeatedly.</span></p>virendrakabra14@gmail.com (kabra1110)Sun, 25 Jun 2023 16:37:46 +0000https://blogs.python-gsoc.org/en/kabra1110s-blog/post-5/Post 4https://blogs.python-gsoc.org/en/kabra1110s-blog/post-4/<p><span style="font-size: 20px;">I worked on improving existing data structures</span></p> <ul> <li><span style="font-size: 20px;">1932 - Support nested tuples.</span> <ul> <li><span style="font-size: 20px;">For example, <span style="font-family: Courier New,Courier,monospace;">tuple[tuple[i32, f64], i32]</span></span></li> </ul> </li> <li><span style="font-size: 20px;">1940 - Support optional parameters in list index method.</span> <ul> <li><span style="font-size: 20px;">Now the method signature is <span style="font-family: Courier New,Courier,monospace;">list.index(x[, start[, end]])</span>, as in CPython.</span></li> </ul> </li> </ul> <p><span style="font-size: 20px;">In the coming week, I will continue to add and improve other data structure methods.</span></p>virendrakabra14@gmail.com (kabra1110)Sun, 18 Jun 2023 17:37:54 +0000https://blogs.python-gsoc.org/en/kabra1110s-blog/post-4/Post 3https://blogs.python-gsoc.org/en/kabra1110s-blog/post-3/<p><span style="font-size: 20px;">I continued work on adding methods to existing data structures, and improving others.</span></p> <ul> <li><span style="font-size: 20px;">Added list repeat (1888). Fixed warnings from a couple of other functions</span></li> <li><span style="font-size: 20px;">Fixed an issue to disallow zero step in list, string slicing (1879).</span></li> <li><span style="font-size: 20px;">Completed work for 1845 - there was an issue in popping from nested iterables</span></li> </ul> <p><span style="font-size: 20px;">For the next week, I plan to work on known issues. This includes</span></p> <ul> <li><span style="font-size: 20px;">Supporting nested tuples (1886)</span></li> <li><span style="font-size: 20px;">List index with optional arguments</span></li> </ul>virendrakabra14@gmail.com (kabra1110)Sun, 11 Jun 2023 18:05:35 +0000https://blogs.python-gsoc.org/en/kabra1110s-blog/post-3/Post 2https://blogs.python-gsoc.org/en/kabra1110s-blog/post-2/<p><span style="font-size: 20px;">I worked on the following PRs:</span></p> <ul> <li><span style="font-size: 20px;">Continued work on list pop (1845)</span></li> <li><span style="font-size: 20px;">Tuple concatenation (1865)</span></li> <li><span style="font-size: 20px;">List reverse for nested iterables (1866)</span></li> <li><span style="font-size: 20px;">Support increment operator for dictionaries (1843)</span></li> </ul>virendrakabra14@gmail.com (kabra1110)Sun, 04 Jun 2023 18:13:42 +0000https://blogs.python-gsoc.org/en/kabra1110s-blog/post-2/Post 1https://blogs.python-gsoc.org/en/kabra1110s-blog/post-1/<p><span style="font-size: 20px;">I am working on the project - Implementing and improving advanced data structures, with mentor <a href="https://github.com/czgdp1807">Gagandeep Singh</a>.</span></p> <p><span style="font-size: 20px;">Currently I am working on improving the list data structure by adding more function and fixing issues. Further, functions are being created using the IntrinsicFunction architecture. This helps prevent registering new nodes in the grammar for each method.</span></p> <p><span style="font-size: 20px;">Some PRs that I have worked on</span></p> <ul> <li><span style="font-size: 20px;">1676: Add list count</span></li> <li><span style="font-size: 20px;">1703: Add list index</span></li> <li><span style="font-size: 20px;">1758: Add list reverse</span></li> <li><span style="font-size: 20px;">1805: Add list pop</span></li> </ul> <p><span style="font-size: 20px;">I am also working on issues related to other data structures such as dictionary, tuple, etc. raised on the repository.</span></p> <p><span style="font-size: 20px;">In the next weeks, I plan to continue working on these features, and later implement other data structures.</span></p>virendrakabra14@gmail.com (kabra1110)Sat, 27 May 2023 19:39:05 +0000https://blogs.python-gsoc.org/en/kabra1110s-blog/post-1/