GSoC Weekly Check-In #5 (July 5)

Published: 07/07/2021

What did I do this week?
This week, I did some benchmarking of performance of different hashing algorithms for different dataset sizes. After my benchmark, I’ve concluded that the best hashing algorithms to use are murmurhash3 or xxhash.

What will I do next week?
This week, I’ll start integrating my algorithm into Hub. I’ll also be looking into ‘Bloom filters’ if that is a possible option to deal with larger datasets.

Did I get stuck anywhere?
No, this week’s task was straightforward. Although, one thing I found problematic was unzipping datasets with > 1 million files takes 3-4 hours.