Blog Post #3: Update Hub auto to work with compression ๐Ÿค–

Eshan
Published: 07/01/2021

Hey friends!
Welcome to the 3rd blog post on my Google Summer of Code '21 Journey. This is a follow-up to my last week's blog post. Things have taken an interesting turn with Wasabi integration and Compression being merged into the repo release/2.0.

Previously, I was working on updating feature/2.0/hub-auto branch to work with release/2.0. Now with the latest integration, this process has become as little complex as a lot of changes have been made, for good.

I tried merging feature/2.0/hub-auto into release/2.0 locally. I fixed several merge conflicts however, I got stuck at this error.



I spent few days solving these errors however, my efforts proved futile. I brought this up to my mentor Dyllan and he stated an alternate approach. Since neither Compression/Wasabi nor Hub auto was written by me. It was given that I would have face trouble merging two pieces of code. He also kindly allowed me to take my time with it.

He suggested that I should start with release/2.0 and start from there to work towards Hub 2.0. This approach might take time but would allow me to gain a deep sense of the working of Hub auto. Thus I have begun working on the building Hub auto from the ground up.

Shh! I am also giving myself another day to work on the integration of hub 2.0 if it works great! else I will resolve back to start building it from the ground up.

I will be mainly working on 2 functions:
  • from_kaggle
  • This function would download the dataset from Kaggle, Convert it into a Structured Hub dataset - locally.
  • from_path
  • This function is the essence of Hub auto, it will allow users to directly convert any unstructured datastet to Hub structured dataset with just one line of code.

I have begun working on these functions and things are turning out satisfactory. Hope to get in a lot of progress this week! ๐Ÿš€
DJDT

Versions

Time

Settings from gsoc.settings

Headers

Request

SQL queries from 1 connection

Static files (2312 found, 3 used)

Templates (11 rendered)

Cache calls from 1 backend

Signals

Log messages