GSoC Blog | Activeloop | Week 12

Published: 09/11/2022

What's Done

  • Created a tutorial showing how to find label errors in Hub datasets: Finding Label Issues in Image Classification Datasets
  • Completed a blog post How Noisy Labels Impact ML Models. This blog touches on some of the reasons on why labeling errors happen, why the errors in labels are imperative and what tools and techniques can be used to overcome these errors. At the end, it shows how to use cleanlab to easily find noise in Hub datasets.

Next Steps

  • Finalize PR and fix reviewers’ feedback.
  • Try to create and run unit tests.
  • Check if custom transform function works with the workflow.
  • Finalize final names of functions, like find_mislabels, fix_issues, find_issues, add_issues_tensors.

I’ll do the following if I have extra time:

  • Add valid_transform parameter.
  • Make it possible to select specific tensors from validation set.
  • Add message that we checked out on the branch after adding tensors.
  • Add dataset health printout.
  • Try to pass x and y instead.