What did I do this week?
This week, I started working with the Hub 2.0 codebase. I’ve implemented hashing samples in a dataset using murmurhash3. Depending on which tensor is selected, the hashes are generated and stored as a json file inside a Hub dataset.
What will I do next week?
Next week, I’ll be implementing a way to compare the hash list generated for the dataset being loaded to hash lists in Hub’s cloud storage. This will prevent dataset duplication. Hub users will know if the dataset they’re uploading already exists.
Did I get stuck anywhere?
I had trouble figuring out how caching works in Hub. A call with my mentor (Abhinav) cleared everything up.
This week, I started working with the Hub 2.0 codebase. I’ve implemented hashing samples in a dataset using murmurhash3. Depending on which tensor is selected, the hashes are generated and stored as a json file inside a Hub dataset.
What will I do next week?
Next week, I’ll be implementing a way to compare the hash list generated for the dataset being loaded to hash lists in Hub’s cloud storage. This will prevent dataset duplication. Hub users will know if the dataset they’re uploading already exists.
Did I get stuck anywhere?
I had trouble figuring out how caching works in Hub. A call with my mentor (Abhinav) cleared everything up.