What a week! 3 PRs in one week, loving this! 🔥

Published: 08/15/2021

Hey friends!
Welcome to my blog post on my Google Summer of Code '21 Journey. The last week is about to begin and we're gonna be in the endgame soon enough, however this blog post is about the 3 new features I have been working on to get them into Hub auto quickly before we wrap up GsoC.

3 PRs for 3 features:
  • kaggle fixture
  • auto-compression
  • ingestion summary

  • Without further ado, lets talk about these features

    Kaggle fixture [merged]

    Right now, all kaggle tests run as regular tests. However with this PR, I have added a —kaggle fixture to specifically run pytests. This makes hub auto testing for kaggle datasets efficient and convienient.

    Auto-compression [in review]

    Hub auto did everything automatically except figuring out the compression type for the dataset. So naturally this made that possible. As long as the dataset is one of the accepted dataset types (.jpeg, .png, .jpg) setting 'compression' argument as "auto" will automatically figure out the compression type of the entire dataset.

    Ingestion summary [in review]

    Now once the ingestion process is complete, it's pretty neat to give the user an idea of all files skipped due to errors in the dataset.

    Check this out:

    When all files under the directory "images_classification" are ingested.

    When "images_classification/lol.json' and "image_classification/class0/lol.json" are skipped during ingestion.

    To conclude, 3 amazing feature. 1 merged, 2 in review, Coming to Hub in a few days.

    with ♥️, E