What did I do this week?
I implemented separating hashed samples into separate buckets. This will enable a faster search for hashes as each hash won't have to be used during comparision.
What will I do next week?
Did I get stuck anywhere?
Separating hashes into buckets was a bit more complicated than I thought. The hashes aren’t always split evenly among different buckets.
I implemented separating hashed samples into separate buckets. This will enable a faster search for hashes as each hash won't have to be used during comparision.
What will I do next week?
- Understand how Hub stores datasets. I don’t fully understand how Hub compresses datasets and how the chunking occurs. I intend to learn about this
- Learn how to work with an Amazon S3 bucket and EC2 instance. I haven’t used these before but I’ll need it when working with >1TB sized datasets.
- Replace hash list with a Hash table or Merkle tree. Right now, I’m storing hashes in a list. This doesn’t seem optimal. I’ll try to implement a Hash Table or a Merkle tree to store these hashes so it is quicker to find hashes during lookup.
Did I get stuck anywhere?
Separating hashes into buckets was a bit more complicated than I thought. The hashes aren’t always split evenly among different buckets.