Blog Post #2: Research on Datasets 📕

Published: 07/01/2021

Hey friends!
Welcome to the 2nd blog post on my Google Summer of Code '21 Journey. It's exciting times, a lot of work is being done on Hub 2.0 and things are looking better than ever! 🚀

How it started

This week, I put myself in the users' shoes. I started using Hub 2.0 to upload some datasets, I was using Hub 1.3.7 however, I encountered this error:

Some fiddling around with my datasets and I couldn't figure out what the problem is. I approached the community members on the Slack channel and they asked me to roll back to Hub version from 1.3.7 to 1.3.5, the issue persisted. This gave me an opportunity to work with the alpha release of Hub 2.0 to solve my error, specifically Hub auto! Hub auto is a feature I am working on for Hub 2.0. It currently works on Image classification tasks thanks to my super mentor Dyllan.

How its going

With the issues I encountered, Dyllan asked me to test Hub auto with datasets. I took this opportunity to document the errors on Notion. Currently, Hub auto works on 3 file extensions [.jpeg, jpg, png] I created a list of all file extensions that do not work and throw in the error does not support the " " extension. Available extensions: ['.jpeg', '.jpg', '.png'].
I went ahead and created a color-coded toggle list that would have the dataset up-top and it would reveal the datasets under the hood. I tested Hub auto with around 200 datasets and made a detail list of all errors encountered as such:

File Extensions

Hub auto errors

Summing up
It's been a productive week, I have utilised my time testing hub auto and I am looking forward to to bring it up to the release/2.0 state next week.