GSoC Weekly Check-In #7 (July 19)

rahulbshrestha
Published: 07/19/2021

What did I do this week?
This week, I started working with the Hub 2.0 codebase. I’ve implemented hashing samples in a dataset using murmurhash3. Depending on which tensor is selected, the hashes are generated and stored as a json file inside a Hub dataset.

What will I do next week?
Next week, I’ll be implementing a way to compare the hash list generated for the dataset being loaded to hash lists in Hub’s cloud storage. This will prevent dataset duplication. Hub users will know if the dataset they’re uploading already exists.

Did I get stuck anywhere?
I had trouble figuring out how caching works in Hub. A call with my mentor (Abhinav) cleared everything up.


DJDT

Versions

Time

Settings from gsoc.settings

Headers

Request

SQL queries from 1 connection

Static files (2312 found, 3 used)

Templates (11 rendered)

Cache calls from 1 backend

Signals

Log messages