GSoC Blog | Activeloop | Week 10

lowlypalace
Published: 09/11/2022

What’s Done ✅

→ Updated API

from hub.integrations.cleanlab import clean_labels

training_params = {'module' = resnet18(), 'criterion' = CrossEntropyLoss, 
'optimizer' = SGD, 'epochs' = 10, 'optimizer_lr' = 0.01, 'device' = "cpu",
'folds = 5'}

clean_labels( ds,
					  	training_params = training_params,
			        verbose = True,
			        tensors = ['images', 'labels'],
			        overwrite = False,
							num_workers = 1,
							batch_size = 1,
							shuffle = True,
							transform = {},
							create_tensors = True
				...
)

→ Added create_tensors flag.

create_tensors boolean flag would be useful here to confirm if a user wants to append new label_issues tensor. If the flag create_tensors is False, then is_label_issues, label_quality_scores numpy arrays are returned. If True, tensors is_label_issues and label_quality_scores are created and also returned as numpy arrays.

→ Added support to provide validation set for training

clean_labels(*ds_train, ds_valid)*
No support yet to compute label errors for validation set

→ Made providing tensors names more explicit

→ Fixed some errors related to checking if an image tensor is RGB or Grayscale

→ Minor improvements (e.g. matching device in the core function rather than making it a required parameter)

What’s Next

Coding

→ Prune API

prune_labels(ds)
Instead of deleting samples, enable users to create an instance of the dataset that would only fetch correct samples when filling up batches?
- It could be easily possible for users to ds = ds[clean_idx] and then use a clean dataset for the downstream.
Leave out pruning to the users and code it up in the blog post instead?
Create a new branch

→ Create a tensor guessed_label to add labels guessed by the classifier after pruning.

Relabeling workflow on Activeloop?

→ Create custom config for pip install (e.g. pip install hub[’cleanlab’])

→ Add flag branch to move to a different branch instead of making a commit on a current branch.

→ Add flags add_branch = True

~~→ Add support for bounding boxes, task = 'classification' or task = 'segmentation'~~

→ Raise error if not htype image

~~→ Add support for TensorFlow modules~~

→ Add optional cleanlab kwargs to pass down

→ Add optional skorch kwargs to pass down

→ Tests

Unit tests
Tests with Activeloop datasets

→ Make it possible to skorch(ds)

→ Raise error if I don’t have write access

GSoC Blog | Activeloop | Week 10

What’s Done ✅

What’s Next

Coding

Versions

Time

Settings from gsoc.settings

Headers

Request

SQL queries from 1 connection

Static files (2312 found, 3 used)

Templates (11 rendered)

Cache calls from 1 backend

Signals

Log messages