What’s Done ✅
→ Updated API
from hub.integrations.cleanlab import clean_labels
training_params = {'module' = resnet18(), 'criterion' = CrossEntropyLoss,
'optimizer' = SGD, 'epochs' = 10, 'optimizer_lr' = 0.01, 'device' = "cpu",
'folds = 5'}
clean_labels( ds,
training_params = training_params,
verbose = True,
tensors = ['images', 'labels'],
overwrite = False,
num_workers = 1,
batch_size = 1,
shuffle = True,
transform = {},
create_tensors = True
...
)
→ Added create_tensors
flag.
create_tensors
boolean flag would be useful here to confirm if a user wants to append new label_issues
tensor. If the flag create_tensors
is False
, then is_label_issues
, label_quality_scores
numpy arrays are returned. If True
, tensors is_label_issues
and label_quality_scores
are created and also returned as numpy arrays.
→ Added support to provide validation set for training
clean_labels(*ds_train, ds_valid)*
- No support yet to compute label errors for validation set
→ Made providing tensors names more explicit
→ Fixed some errors related to checking if an image tensor is RGB or Grayscale
→ Minor improvements (e.g. matching device
in the core function rather than making it a required parameter)
What’s Next
Coding
→ Prune API
prune_labels(ds)
- Instead of deleting samples, enable users to create an instance of the dataset that would only fetch correct samples when filling up batches?
- It could be easily possible for users to
ds = ds[clean_idx]
and then use a clean dataset for the downstream.
- Leave out pruning to the users and code it up in the blog post instead?
- Create a new branch
→ Create a tensor guessed_label
to add labels guessed by the classifier after pruning.
- Relabeling workflow on Activeloop?
→ Create custom config for pip install
(e.g. pip install hub[’cleanlab’]
)
→ Add flag branch
to move to a different branch instead of making a commit on a current branch.
→ Add flags add_branch = True
→ Add support for bounding boxes, task = 'classification'
or task = 'segmentation'
→ Raise error if not htype image
→ Add support for TensorFlow
modules
→ Add optional cleanlab
kwargs to pass down
→ Add optional skorch
kwargs to pass down
→ Tests
- Unit tests
- Tests with Activeloop datasets
→ Make it possible to skorch(ds)
→ Raise error if I don’t have write access