What’s Done ✅
→ Updated API
from hub.integrations.cleanlab import clean_labels
training_params = {'module' = resnet18(), 'criterion' = CrossEntropyLoss,
'optimizer' = SGD, 'epochs' = 10, 'optimizer_lr' = 0.01, 'device' = "cpu",
'folds = 5'}
clean_labels( ds,
training_params = training_params,
verbose = True,
tensors = ['images', 'labels'],
overwrite = False,
num_workers = 1,
batch_size = 1,
shuffle = True,
transform = {},
create_tensors = True
...
)
→ Added create_tensors flag.
create_tensors boolean flag would be useful here to confirm if a user wants to append new label_issues tensor. If the flag create_tensors is False, then is_label_issues, label_quality_scores numpy arrays are returned. If True, tensors is_label_issues and label_quality_scores are created and also returned as numpy arrays.
→ Added support to provide validation set for training
clean_labels(*ds_train, ds_valid)*
- No support yet to compute label errors for validation set
→ Made providing tensors names more explicit
→ Fixed some errors related to checking if an image tensor is RGB or Grayscale
→ Minor improvements (e.g. matching device in the core function rather than making it a required parameter)
What’s Next
Coding
→ Prune API
prune_labels(ds)
- Instead of deleting samples, enable users to create an instance of the dataset that would only fetch correct samples when filling up batches?
- It could be easily possible for users to
ds = ds[clean_idx] and then use a clean dataset for the downstream.
- Leave out pruning to the users and code it up in the blog post instead?
- Create a new branch
→ Create a tensor guessed_label to add labels guessed by the classifier after pruning.
- Relabeling workflow on Activeloop?
→ Create custom config for pip install (e.g. pip install hub[’cleanlab’])
→ Add flag branch to move to a different branch instead of making a commit on a current branch.
→ Add flags add_branch = True
→ Add support for bounding boxes, task = 'classification' or task = 'segmentation'
→ Raise error if not htype image
→ Add support for TensorFlow modules
→ Add optional cleanlab kwargs to pass down
→ Add optional skorch kwargs to pass down
→ Tests
- Unit tests
- Tests with Activeloop datasets
→ Make it possible to skorch(ds)
→ Raise error if I don’t have write access