GSoC Weekly Check-In #5 (July 5)

rahulbshrestha
Published: 07/07/2021

What did I do this week?
This week, I did some benchmarking of performance of different hashing algorithms for different dataset sizes. After my benchmark, I’ve concluded that the best hashing algorithms to use are murmurhash3 or xxhash.

What will I do next week?
This week, I’ll start integrating my algorithm into Hub. I’ll also be looking into ‘Bloom filters’ if that is a possible option to deal with larger datasets.

Did I get stuck anywhere?
No, this week’s task was straightforward. Although, one thing I found problematic was unzipping datasets with > 1 million files takes 3-4 hours.


DJDT

Versions

Time

Settings from gsoc.settings

Headers

Request

SQL queries from 1 connection

Static files (2312 found, 3 used)

Templates (11 rendered)

Cache calls from 1 backend

Signals

Log messages