What's Done
- Created a tutorial showing how to find label errors in Hub datasets: Finding Label Issues in Image Classification Datasets
- Completed a blog post How Noisy Labels Impact ML Models. This blog touches on some of the reasons on why labeling errors happen, why the errors in labels are imperative and what tools and techniques can be used to overcome these errors. At the end, it shows how to use cleanlab to easily find noise in Hub datasets.
Next Steps
- Finalize PR and fix reviewers’ feedback.
- Try to create and run unit tests.
- Check if custom transform function works with the workflow.
- Finalize final names of functions, like
find_mislabels
,fix_issues
,find_issues
,add_issues_tensors
.
I’ll do the following if I have extra time:
- Add
valid_transform
parameter. - Make it possible to select specific tensors from validation set.
- Add message that we checked out on the branch after adding tensors.
- Add dataset health printout.
- Try to pass
x
andy
instead.