GSoC Weekly Check-In #3 (June 21)

Published: 06/22/2021

What did I do this week?
I implemented separating hashed samples into separate buckets. This will enable a faster search for hashes as each hash won't have to be used during comparision.

What will I do next week?
  • Understand how Hub stores datasets. I don’t fully understand how Hub compresses datasets and how the chunking occurs. I intend to learn about this
  • Learn how to work with an Amazon S3 bucket and EC2 instance. I haven’t used these before but I’ll need it when working with >1TB sized datasets.
  • Replace hash list with a Hash table or Merkle tree. Right now, I’m storing hashes in a list. This doesn’t seem optimal. I’ll try to implement a Hash Table or a Merkle tree to store these hashes so it is quicker to find hashes during lookup.

Did I get stuck anywhere?
Separating hashes into buckets was a bit more complicated than I thought. The hashes aren’t always split evenly among different buckets.