Eshan's Blog


Published: 08/20/2021

Hey friends!
Thank you for being a part of this wonderful journey. The past 10 weeks were some of the best I have ever had and I am glad to have shared them with you.

Let's get on with the finale check-in with these simple questions.

What did I do this week?

Following up on last week's blog, All 3 PRs were merged. I sent out another PR to fix an ingestion summary bug and Colab fixes.
Additionally, I had presented my final project on Friday to the Activeloop team. They're quite happy with it and I can't wait for everyone to start using Hub auto asap!

What will I do next week?

I plan to continue contributing to Hub as much as I can. I love the package and I'm grateful to be working with such smart people.

Did I get stuck anywhere?

I found myself stuck at returning the correct number of ingested files in a directory, I have sent out a new PR to temporarily remove this feature and instead keep the ingestion summary simple.

Thank you for reading this, I am grateful for your time and hope you were able to learn something from my journey.

I'm gonna miss this a lot.
View Blog Post

What a week! 3 PRs in one week, loving this! πŸ”₯

Published: 08/15/2021

Hey friends!
Welcome to my blog post on my Google Summer of Code '21 Journey. The last week is about to begin and we're gonna be in the endgame soon enough, however this blog post is about the 3 new features I have been working on to get them into Hub auto quickly before we wrap up GsoC.

3 PRs for 3 features:
  • kaggle fixture
  • auto-compression
  • ingestion summary

  • Without further ado, lets talk about these features

    Kaggle fixture [merged]

    Right now, all kaggle tests run as regular tests. However with this PR, I have added a β€”kaggle fixture to specifically run pytests. This makes hub auto testing for kaggle datasets efficient and convienient.

    Auto-compression [in review]

    Hub auto did everything automatically except figuring out the compression type for the dataset. So naturally this made that possible. As long as the dataset is one of the accepted dataset types (.jpeg, .png, .jpg) setting 'compression' argument as "auto" will automatically figure out the compression type of the entire dataset.

    Ingestion summary [in review]

    Now once the ingestion process is complete, it's pretty neat to give the user an idea of all files skipped due to errors in the dataset.

    Check this out:

    When all files under the directory "images_classification" are ingested.

    When "images_classification/lol.json' and "image_classification/class0/lol.json" are skipped during ingestion.

    To conclude, 3 amazing feature. 1 merged, 2 in review, Coming to Hub in a few days.

    with β™₯️, E
    View Blog Post

    Week 8-9: Hub auto PR merged πŸ₯° + New stuff 🐳

    Published: 08/07/2021

    Without further ado, Let's take a look into what I've been doing for the past week!

    What did I do this week?

    Following up on last week's blog, I fixed a few issues and implemented the suggestions that were provided by the activeloopai community (Dyllan, Abhinav, Fariz and Ivo). Well then, my code passes all test cases and I received 4 PR approves!
    The PR got merged instantly!

    Without wasting any time, I started working on the next steps for Hub auto because we've got a lot planned!

    Couple of things that are coming up!
  • Auto detect compression type
  • Hub auto docs
  • kaggle pytest fixture
  • Ingestion summary
  • and more..

  • I specifically targeted an issue where the code fails if the kaggle credentials aren't stored as environment variables in the system.
    And then worked on a script that would auto detect the compression type of the dataset. I submitted a PR for that here. But i think I'm gonna submit another one that includes the auto detect script and the kaggle pytest fixture.

    What will I do next week?

    I'm going to finish up the automatic compression detection, this would enable the user to just mention the dataset he / she wishes to ingest and boom! the code will figure out the most optimal image compression type and ingest it into a Hub dataset.

    Did I get stuck anywhere?

    I wouldn't say I was "stuck" somewhere. I just went ahead and learned what I didn't know. I figured out the auto detection of compression and that use a fraction of my brain power hehe.
    View Blog Post

    Blog Post #5: PR to the main branch 🌱

    Published: 07/31/2021

    Hey friends!
    I have always considered working on open-source projects to be closely related to watering my plants, to watch them burgeon brings pride and happiness to my heart. This is my first PR for hub auto and I can't wait for the world to use it.

    The past 2 weeks have been all about:
  • Fine tuning the API code
  • Writing quality tests
  • Solving bugs
  • Providing well-defined docstrings

  • My PR has been reviewed and is about to get merged real soon, I am VERY excited for people to try out Hub auto and use it as the primary method of ingestion for image classification datasets.
    Currently, I update my code with the suggestions I receive from the community in the form of PR reviews and would love to summarise my work in the blog here!

    This PR provides 2 APIs:


    The ingest API provides a one line solution to get your locally stored Image dataset on Hub.


    The ingest API provides a one line solution to get your Kaggle dataset to Hub.

    I believe this is going to open doors to a lot of new Hub use cases and make onboarding of new users to Hub a breeze πŸ–
    For anyone interested, my PR is available here!
    View Blog Post

    Week 6-7 check-in: Submit draft PR for πŸš€

    Published: 07/23/2021

    Let's take a look into what I've been doing for the past week!

    What did I do this week?

    Following up on last week's blog, I have been working consistently to write good test cases for and I am glad to share that I finished up writing the tests this week (yesterday to be more precise). My mentor, Dyllan asked me to create another create another branch feature/2.0/hub-auto2 as the branch from last week has serious merge conflicts that couldn't be resolved.
    Additionally, I wrote tests for increasing the coverage of the code using pytest.raises() and I made more robust by handling few important edge cases.

    What will I do next week?

    I plan to fix anything that comes up as an issue and try to get the branch merged into main asap! That's the main goal.
    Apart form that I will be volunteering to upload more datasets to Hub 2.0's cloud!

    Did I get stuck anywhere?

    I found myself stuck at few bugs after creating a draft PR. One issue was the tensor_names and label_names needed to replace every occurrence of ('\\') to ('/') using a string.replace() function as the circle tests failed on Windows. However, setting fixtures for Kaggle and Ingestion tests is where i spent most of the time understanding foreign code and then later integrated them with my tests.
    All in all this week was exciting and a lot of fun! 🐳
    View Blog Post