Published: 06/20/2021

Hi Python community members! My name is Rahul and I'm an incoming Master's student in Informatics at TU Munich. I am stoked about being accepted to GSoC with Activeloop!

The problem I'll be working on this GSoC is interesting and challenging. Datasets are often modified and there is no efficient way to check if two datasets are identical. This becomes worse for large datasets that don't fit in the memory (1TB+). My project intends to design a hashing technique to compare such large scale, out-of-core machine learning datasets.