Eshan's Blog

Week 2-3 Coding and Research 🧐

Published: 06/14/2021

The Python Software Foundation provides a pretty template, so I have decided to use that to answer 3 basic questions about my week's progress.

What did I do this week?

I have been in touch with my mentor Dyllan, we have begun brainstorming on how to approach our project. Last week's work on Index Map was taken over by Dyllan, as it was a little ambiguous and complex for me. Well, I am glad I gave it a shot because no effort hasn't ever paid off.
Next up, I am going through a tonne of Kaggle datasets and figuring out how the structure co-relation. I am using Notability and Notion for highlighting and creating a decent layout of my work.
Here's some Behind the Scenes: 🐳



What will I do next week?

I plan to go through a lot more datasets to cover as many edge cases as possible. In parallel, I am working on uploading datasets to Hub using its latest Hub 2.0 alpha version which released last week.

Did I get stuck anywhere?

I was unable to upload datasets to Hub, turns out there was a bug which is actively been worked on and I was recommended by the proactive community to use Hub 2.0 to proceed further.

Well, that's all for now. I will write a comprehensive one next week! ✨
View Blog Post

Week 1-2 (Community Bonding Period) 🌎

Published: 06/07/2021

Hey friends!
Happy to say that I'm off to a great start to Google Summer of Code! The first 2 weeks we have the Community Bonding Period, the idea behind this is to get a good grasp of the codebase and connect with the wonderful people at the organisation.
My community bonding period started with an insightful discussion with my mentor Dyllan McCreary, who walked me through the new codebase Activeloop has been working on for Hub 2.0. He was kind enough to get deep into the important parts of Hub. Dyllan assigned me a few (3) tasks that would help me get familiarised with the codebase.
This session was quite helpful and now that I have tasks assigned to me I thought it was great that I could get a head-start to the summer of code!

Task 1: Depickle the code!

As a general practice it is advised against to use pickle() in your code as it is susceptible to security vulnerabilities. Hence, I was required to replace all occurances of pickle.dumps() and pickle.loads() to something else. (json did the trick!) For more info on why pickle shouldn't be used check this out.

Task 2: Convert IndexMap to a list of IndexMapEntry

This task took the longest to complete. We have been through multiple ways we could do this and have decided to stick to the class method. I was required to convert Indexmap from a list of dictionaries to a list of IndexMapEntry (new class). Thus creating new classes for both IndexMap and IndexMapEntry, followed by writing tests for the same. In the course of this task, Dyllan introduced me to a datatype namedTuples and also helped me get started with parametrising tests. I have implemented the classes and tests however, this broke some part of the other code the team has tirelessly put together :(
I am currently working on it and it should be done by today 🤞🏻

Task 3: Modify Read/Write fixtures for cache in Storage tests

This was a 2-minute task but it helped me get familiar with a few storage things that happen under the hood in Hub 2.0.

All these tasks were designed to give me a solid head start, in hope of making me feel at home when working on my project this summer!

Apart from these tasks we witnessed the launch of the Alpha version of Hub 2.0 on 3rd June 2021. It was a wonderful launch where the team showcased all the shiny features coming to Hub 2.0 with a progress report on what is implemented. The results displayed more than a 6x increase in performance compared to Hub 1.0. Activeloop's CEO Davit Buniatyan's presentation showed what the team has achieved with Hub 2.0 is truly remarkable!
Things are just getting started and I am beyond excited for what is to come! ♥️
View Blog Post

Welcome to my GSoC journey ✨

Published: 06/07/2021

Hello there!
I am Eshan Arora (@thisiseshan everywhere on the internet), a senior year computer engineering undergrad at NIT Surat, India.
Welcome to my blog, I am thrilled to share that I will be working on Hub 2.0 by this summer!

Hub aims to reduce time spent by the researchers figuring out data by enabling dataset streaming, thus allowing data scientists to spend more of their time on building epic Machine Learning models.

My project this summer is to implement Automatic generation of Schema at Hub 2.0, which would enable any kind of dataset to be seamlessly stored at Hub with just a single line of code.
I am excited to be given this opportunity and aim to bring the best out of it this summer!

You can checkout Hub here
For Hub 2.0 checkout the release/2.0 branch! 🚀
View Blog Post