Articles on lowlypalace's Bloghttps://blogs.python-gsoc.orgUpdates on different articles published on lowlypalace's BlogenSun, 11 Sep 2022 10:15:45 +0000GSoC Blog | Activeloop | Week 12https://blogs.python-gsoc.org/en/lowlypalaces-blog/gsoc-blog-activeloop-week-12/<h1>What's Done</h1> <ul> <li>Created a tutorial showing how to find label errors in Hub datasets: <a href="https://colab.research.google.com/drive/1ufji2akWX0r6DcUD70vK3KiBvq0m6xbq#scrollTo=3zK9b4yiMRzB&amp;uniqifier=1">Finding Label Issues in Image Classification Datasets</a></li> <li>Completed a blog post How Noisy Labels Impact ML Models. This blog touches on some of the reasons on why labeling errors happen, why the errors in labels are imperative and what tools and techniques can be used to overcome these errors. At the end, it shows how to use cleanlab to easily find noise in Hub datasets.</li> </ul> <h1>Next Steps</h1> <ul> <li>Finalize PR and fix reviewers’ feedback.</li> <li>Try to create and run unit tests.</li> <li>Check if <a href="https://docs.activeloop.ai/hub-tutorials/training-models/training-an-object-detection-and-segmentation-model-in-pytorch">custom transform function</a> works with the workflow.</li> <li>Finalize final names of functions, like <code>find_mislabels</code>, <code>fix_issues</code>, <code>find_issues</code>, <code>add_issues_tensors</code>.</li> </ul> <p>I’ll do the following if I have extra time:</p> <ul> <li>Add <code>valid_transform</code> parameter.</li> <li>Make it possible to select specific tensors from validation set.</li> <li>Add message that we checked out on the branch after adding tensors.</li> <li>Add dataset health printout.</li> <li>Try to pass <code>x</code> and <code>y</code> instead.</li> </ul>danielgareev@gmail.com (lowlypalace)Sun, 11 Sep 2022 10:15:45 +0000https://blogs.python-gsoc.org/en/lowlypalaces-blog/gsoc-blog-activeloop-week-12/GSoC Blog | Activeloop | Week 11https://blogs.python-gsoc.org/en/lowlypalaces-blog/gsoc-blog-activeloop-week-11/<h1>What’s Done <strong><strong>✅</strong></strong></h1> <p>→ Updated API</p> <pre><code>from hub.integrations.cleanlab import clean_labels, create_tensors, clean_view from hub.integrations import skorch ds = hub.load("hub://ds") tform = transforms.Compose( [ transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,)), ] ) transform = {"images": tform, "labels": None} # Get scikit-learn compatible PyTorch module to pass into clean_labels as a classifier model = skorch(dataset=ds, epochs=5, batch_size=16, transform=transform, tensors=[], valid_transform, skorch_kwargs) # Obtain a DataFrame with columns is_label_issue, label_quality and predicted_label label_issues = find_label_issues( dataset=ds, model=model, folds=3, ) # Create label_issues tensor group on "labels" branch create_label_isssues_tensors( dataset=ds, label_issues=label_issues, branch="labels" ) # Get dataset view where only clean labels are present, and the rest are filtered out. ds_clean = clean_view(ds) </code></pre> <p>→ Link to PR: <a href="https://github.com/activeloopai/Hub/pull/1821">https://github.com/activeloopai/Hub/pull/1821</a></p> <h2>Skorch Integration</h2> <h3>**<code>skorch**()</code></h3> <ul> <li> <p>Added support for providing the validation set for training <code>skorch(dataset=ds, valid_dataset=valid_ds)</code></p> </li> <li> <p>Added keyword arguments that can be passed into modules to fine-tune the parameters for advanced users.</p> <ul> <li><code>skorch_kwargs</code> arguments to be passed to the skorch <code>NeuralNet</code> constructor. Additionally, <code>iterator_train__transform</code> and <code>iterator_valid__transform</code> can be used to set params for the training and validation iterators.</li> </ul> </li> <li> <p>Made passing in the images and labels tensors more explicit.</p> </li> <li> <p>Modularized methods.</p> <ul> <li>Separated <code>skorch</code> module from <code>cleanlab</code> to make it easier to instantiate skorch even if you’re not using <code>cleanlab</code> in the downstream.</li> <li>Further modularized <code>skorch</code> module into separate functions and modules.</li> <li>Added utils functions in a separate file.</li> </ul> </li> <li> <p>Added error-checking utils to check errors early.</p> <p>→ Check if a <code>dataset</code> and <code>valid_dataset</code> that’s passed in is a Hub Dataset.</p> <p>→ Check if the tensors’ <code>htypes</code> are supported for image classification tasks.</p> </li> </ul> <h2>Cleanlab Integration</h2> <h3><code>clean_labels()</code></h3> <ul> <li>Implemented a function to compute  <code>guessed_label</code>  by the classifier after pruning.</li> <li>Added flag <code>pretrained</code> to skip cross-validation if <strong>pretrained</strong> model is used to compute out-of-sample probabilities faster on a single <code>fit()</code>.</li> <li>Instead of returning a tuple of numpy ndarrays <code>label_issues</code>, <code>label_quality_scores</code> and <code>predicted_labels</code>, now <code>clean_labels()</code> returns a single <code>label_issues</code> dataframe with columns <code>is_label_issue</code>, <code>label_quality</code>, <code>predicted_label</code>.</li> <li>Added keyword arguments that can be passed into modules to fine-tune the parameters for advanced users. <ul> <li><code>label_issues_kwargs</code> can be be passed to the <code>cleanlab.filter.find_label_issues</code> function.</li> <li><code>label_quality_kwargs</code> can be passed to the <code>cleanlab.rank.get_label_quality_scores</code> function.</li> </ul> </li> </ul> <h3><code>create_tensors()</code></h3> <ul> <li>Added the ability to select <code>branch</code> to commit to when creating tensors</li> <li>Modularized methods. <ul> <li><code>create_tensors()</code> is now a separate method that takes in <code>label_issues</code> dataframe or looks for <code>label_issues</code> in tensors to get a view where only clean labels are present and the rest are filtered out. This will now return <code>commit_id</code>.</li> <li>Added utils functions in a separate file.</li> </ul> </li> <li>Added error-checking utils to check errors early. <ul> <li>Check early if a user has write access to the dataset before creating the tensors.</li> <li>Check if <code>label_issues</code> dataframe columns have correct <code>dtypes</code> and are a subset of a dataset before appending them to tensors.</li> </ul> </li> </ul> <h3><code>clean_view()</code></h3> <ul> <li>Added a method <code>clean_view(ds)</code> to get a dataset view where only clean labels are present, and the rest are filtered out. This can be useful to pass the clean dataset to downstream ML frameworks for training.</li> </ul> <h2>Other</h2> <ul> <li>Created custom config for dependencies <code>hub[’cleanlab’]</code>.</li> <li>Created common utils that are reused across modules.</li> <li>Renamed some of the function and variable names to be more clear.</li> <li>Clarified the docstrings parameters and improved readability.</li> <li>Merged <code>main</code> branch and resolved conflicts.</li> <li>Commented on the aspects I’m not sure about in my PR.</li> <li>Run tests on 10+ Activeloop image classification datasets (without creating tensors).</li> </ul> <h1>Next Steps</h1> <ul> <li>Finalize PR after the getting reviewers’ feedback.</li> <li>Try to create and run unit tests.</li> <li>Create a notebook that showcases the workflow (such as <a href="https://docs.activeloop.ai/playbooks/evaluating-model-performance">https://docs.activeloop.ai/playbooks/evaluating-model-performance</a>)</li> <li>Create a blog post with a bit more insight into the problem statement and results of the running workflow on various datasets with varying noise levels (such as <a href="https://www.activeloop.ai/resources/">https://www.activeloop.ai/resources/</a>)</li> </ul>danielgareev@gmail.com (lowlypalace)Sun, 11 Sep 2022 10:11:59 +0000https://blogs.python-gsoc.org/en/lowlypalaces-blog/gsoc-blog-activeloop-week-11/GSoC Blog | Activeloop | Week 10https://blogs.python-gsoc.org/en/lowlypalaces-blog/gsoc-blog-activeloop-week-10/<h1>What’s Done <strong><strong>✅</strong></strong></h1> <p>→ Updated API</p> <pre><code>from hub.integrations.cleanlab import clean_labels training_params = {'module' = resnet18(), 'criterion' = CrossEntropyLoss, 'optimizer' = SGD, 'epochs' = 10, 'optimizer_lr' = 0.01, 'device' = "cpu", 'folds = 5'} clean_labels( ds, training_params = training_params, verbose = True, tensors = ['images', 'labels'], overwrite = False, num_workers = 1, batch_size = 1, shuffle = True, transform = {}, create_tensors = True ... ) </code></pre> <p>→ Added <code>create_tensors</code> flag.</p> <ul> <li><code>create_tensors</code> boolean flag would be useful here to confirm if a user wants to append new <code>label_issues</code> tensor. If the flag <code>create_tensors</code> is <code>False</code>, then <code>is_label_issues</code>, <code>label_quality_scores</code> numpy arrays are returned. If <code>True</code>, tensors <code>is_label_issues</code> and <code>label_quality_scores</code> are created and also returned as numpy arrays.</li> </ul> <p>→ Added support to provide validation set for training</p> <ul> <li><code>clean_labels(*ds_train, ds_valid)*</code></li> <li>No support yet to compute label errors for validation set</li> </ul> <p>→ Made providing tensors names more explicit</p> <p>→ Fixed some errors related to checking if an image tensor is RGB or Grayscale</p> <p>→ Minor improvements (e.g. matching <code>device</code> in the core function rather than making it a required parameter)</p> <h1>What’s Next</h1> <h2>Coding</h2> <p>→ Prune API</p> <ul> <li><code>prune_labels(ds)</code></li> <li>Instead of deleting samples, enable users to create an instance of the dataset that would only fetch correct samples when filling up batches? <ul> <li>It could be easily possible for users to <code>ds = ds[clean_idx]</code> and then use a clean dataset for the downstream.</li> </ul> </li> <li>Leave out pruning to the users and code it up in the blog post instead?</li> <li>Create a new branch</li> </ul> <p>→ Create a tensor <code>guessed_label</code> to add labels guessed by the classifier after pruning.</p> <ul> <li>Relabeling workflow on Activeloop?</li> </ul> <p>→ Create custom config for <code>pip install</code> (e.g. <code>pip install hub[’cleanlab’]</code>)</p> <p>→ Add flag <code>branch</code> to move to a different branch instead of making a commit on a current branch.</p> <p>→ Add flags <code>add_branch = True</code></p> <p><s>→ Add support for bounding boxes, <code>task = 'classification'</code> or <code>task = 'segmentation'</code></s></p> <p>→ Raise error if not htype <code>image</code></p> <p><s>→ Add support for <code>TensorFlow</code> modules</s></p> <p>→ Add optional <code>cleanlab</code> kwargs to pass down</p> <p>→ Add optional <code>skorch</code> kwargs to pass down</p> <p>→ Tests</p> <ul> <li>Unit tests</li> <li>Tests with Activeloop datasets</li> </ul> <p>→ Make it possible to <code>skorch(ds)</code></p> <p>→ Raise error if I don’t have write access</p>danielgareev@gmail.com (lowlypalace)Sun, 11 Sep 2022 10:11:13 +0000https://blogs.python-gsoc.org/en/lowlypalaces-blog/gsoc-blog-activeloop-week-10/GSoC Blog | Activeloop | Week 9https://blogs.python-gsoc.org/en/lowlypalaces-blog/gsoc-blog-activeloop-week-9/<h1>What’s Done <strong><strong>✅</strong></strong></h1> <h3>API</h3> <ul> <li>Created an API entry point for cleaning labels in <code>dataset.py</code>. <ul> <li> <p>Cleans the labels of the dataset and creates a set of tensors under <code>label_issues</code> group for the entire dataset.</p> </li> <li> <p>API</p> <pre><code>def clean_labels( self, module = None, criterion = None, optimizer = None, optimizer_lr: int = 0.01, device: str = "cpu", epochs: int = 10, folds: int = 5, verbose: bool = True, tensors: Optional[list] = None, dataloader_train_params: [dict] = None, dataloader_valid_params: Optional[dict] = None, overwrite: bool = False # skorch_kwargs: Optional[dict] = None, ): """ Cleans the labels of the dataset. Computes out-of-sample predictions and uses Confident Learning (CL) algorithm to clean the labels. Creates a set of tensors under label_issues group for the entire dataset. Note: Currently, only image classification task us supported. Therefore, the method accepts two tensors for the images and labels (e.g. ['images', 'labels']). The tensors can be specified in dataloader_train_params or tensors. Any PyTorch module can be used as a classifier. Args: module (class): A PyTorch torch.nn.Module module (class or instance). In general, the uninstantiated class should be passed, although instantiated modules will also work. Default is torchvision.models.resnet18(), which is a PyTorch ResNet-18 model. criterion (class): A PyTorch criterion. The uninitialized criterion (loss) used to optimize the module. Default is torch.nn.CrossEntropyLoss. optimizer (class): A PyTorch optimizer. The uninitialized optimizer (update rule) used to optimize the module. Default is torch.optim.SGD. optimizer_lr (int): The learning rate passed to the optimizer. Default is 0.01. device (str): A PyTorch device. The device on which the module and criterion are located. Default is "cpu". epochs (int): The number of epochs to train for each fit() call. Default is 10. tensors (list): A list of tensor names that would be considered for cleaning (e.g. ['images', 'labels']). dataloader_train_params (dict): Keyword arguments to pass into torch.utils.data.DataLoader. Options that may especially impact accuracy include: shuffle, batch_size. dataloader_valid_params (dict): Keyword arguments to pass into torch.utils.data.DataLoader. Options that may especially impact accuracy include: shuffle, batch_size. If not provided, dataloader_train_params will be used with shuffle=False. overwrite (bool): If True, will overwrite label_issues tensors if they already exists. Default is False. fold (int): Sets the number of cross-validation folds used to compute out-of-sample probabilities for each example in the dataset. The default is 5. skorch_kwargs (dict): Keyword arguments to pass into skorch.NeuralNet. Options that may especially impact accuracy include: ... Returns: label_issues: A boolean mask for the entire dataset where True represents a label issue and False represents an example that is confidently/accurately labeled. label_quality_scores: Returns label quality scores for each datapoint, where lower scores indicate labels less likely to be correct. """ </code></pre> </li> </ul> </li> </ul> <h3>Skorch Integration</h3> <ul> <li>Made <code>skorch</code> compatitable with Hub dataset format. <ul> <li>Added the integration <code>skorch.py</code> in <code>hub/integrations/pytorch</code>.</li> <li>Created a class <code>VisionClassifierNet</code> that wraps the PyTorch Module in an sklearn interface.</li> <li>Make skorch compatitable with Hub’s PyTorch Dataloader.</li> <li>Set the defaults for relevant <code>skorch</code> parameters such as <code>module</code>, <code>criterion</code>, <code>optimizer</code>.</li> </ul> </li> </ul> <h3>Core Functions for Cleaning Labels</h3> <ul> <li>Created the component <code>clean_labels.py</code> in <code>hub/core/experimental/labels</code> . <ul> <li>Implemented core function <code>clean_labels()</code> which cleans the labels of a dataset. <ul> <li>Wraps a PyTorch instance in a sklearn classifier. Next, it runs cross-validation to get out-of-sample predicted probabilities for each example. Then, it finds label issues (boolean mask) and label quality scores (floats from 0 to 1) for each sample in the dataset. At the end, it creates tensors with label issues.</li> </ul> </li> <li>Implemented helper functions. <ul> <li><code>get_dataset_tensors()</code> returns the tensors of a dataset. If a list of tensors is not provided, it will try to find them in the <code>dataloader_train_params</code> in the transform. If none of these are provided, it will iterate over the dataset tensors and return any tensors that match <em>htype</em> <code>'image'</code> for images and <em>htype</em> <code>'class_label'</code> for labels. Additionally, this function will also check if the dataset already has a <code>label_issues</code> group.</li> <li><code>estimate_cv_predicted_probabilities()</code> computes an out-of-sample predicted probability for every example in a dataset using cross validation.</li> <li><code>append_label_issues_tensors()</code> creates a group of tensors <code>label_issues</code>. After creating tensors, automatically commits the changes.</li> </ul> </li> </ul> </li> </ul>danielgareev@gmail.com (lowlypalace)Sun, 11 Sep 2022 10:09:52 +0000https://blogs.python-gsoc.org/en/lowlypalaces-blog/gsoc-blog-activeloop-week-9/GSoC Blog | Activeloop | Week 8https://blogs.python-gsoc.org/en/lowlypalaces-blog/gsoc-blog-activeloop-week-8/<h2>What did you do this week?</h2> <p>This week, after deriving conclusions from my previous experiments, it was the time to take all of the insights as well as the code and try to make cleanlab work with Hub datasets. After working on the integration for a few weeks, I created my <a href="https://github.com/activeloopai/Hub/pull/1821">draft PR</a>.</p> <h2>What is coming up next?</h2> <p>As a next step, I will be finalizing the API structure, as well as adding some additional functionality to the feature.</p> <h2>Did you get stuck anywhere?</h2> <p>Not really.</p>danielgareev@gmail.com (lowlypalace)Sun, 11 Sep 2022 10:08:41 +0000https://blogs.python-gsoc.org/en/lowlypalaces-blog/gsoc-blog-activeloop-week-8/GSoC Blog | Activeloop | Week 7https://blogs.python-gsoc.org/en/lowlypalaces-blog/gsoc-blog-activeloop-week-7-1/<p>This week, I had a quick sync with the mentors as it seemed like I couldn't derive systematic results from running my previous experiments.</p> <p>I liked the idea to introduce some noise to some dataset that has a low rate (e.g. less than 1-3% of misclassified labels to compare baseline with cleanlab. I used Fashion MNIST and introduced some random corruption to the training set by flipping the labels.</p> <p>In one of my experiments, I set the maximum noise to 50% and gradually introduced 5% the noise at each step, comparing the performance of baseline and cleanlab in parallel. This time, instead of relabelling, I decided to prune the samples with low confidence scores. Here’s a quick example: if I have 60,000 samples in the dataset, at 10% of the noise, I’d randomly flip 6,000 labels. The baseline would then be trained with all samples (60,000), but the cleanlab would be trained only on the labels that weren’t classified as erroneous by cleanlab . For example, If cleanlab found that 5,000 are labeled incorrectly, then I would only use 55,000 images for training.</p> <p><img alt="" height="600" src="https://i.postimg.cc/wx08vThM/newplot-20.png" width="1000"></p> <p>It seems that pruning the samples with lower confidence works well, as cleanlab seems to remove the labels that were introduced with each noise level. We can also see that the accuracy stays around 80% with cleanlab, while with the random noise (e.g. without removing any samples) it drops linearly. On average, I can also see that cleanlab  on average prunes more labels than I initially introduced in the data. The mean of additional samples that cleanlab discards is ≈4500 samples across all noise levels. Since I don’t know the true noise in the original dataset, it’s hard to say whether cleanlab is doing a good job on removing these, but I would argue that cleanlab seems to be overestimating. However, it seems to systematically pick up the newly introduced noisy labels and identify them as erroneous.</p> <p><img alt="" height="600" src="https://i.postimg.cc/NMvY1JM0/newplot-21.png" width="1000"></p> <p>I was surprised that after introducing random noise, CL would still prune the erroneous labels. The way CL algorithm works is by accurately and directly characterizing the uncertainty of label noise in the data. The foundation CL depends on is that label noise is class-conditional, depending only on the latent true class, not the data . For instance, a leopard is likely to be mistakenly labeled as a jaguar . Cleanlab takes that assumption and computes joint distribution among different classes (e.g. 3% of the data is labeled leopard (noisy label), but the true label is jaguar). The main idea is that underlying data has implications for the labeler’s decisions, and I basically took this assumption out of equation after randomly swapping labels in the dataset (e.g. I didn’t care if a certain class would be more likely to be mislabelled as another class). I would think that CL algorithm relies on this assumption heavily to guess which label to prune, but the performance was still accurate and stable. In the real-world noisy data, the mislabelling between different classes would have a stronger statistical dependence, so I can say that this example was even a bit more difficult for cleanlab.</p> <p>I’ve tried to experiment with different threshold values for pruning and relabelling the images (e.g. remove <b>20%</b> of the images with lowest label quality, but leave and relabel the rest). I’ve started with a threshold of <b>0%</b> (e.g. relabel all labels to the ones predicted by <code>cleanlab</code> ) and then gradually increased the threshold value with a <b>10%</b> step till I reached <b>100%</b> prune level (e.g. remove all labels that were found to be erroneous by <code>cleanlab</code>). As before, I run these from <b>0</b> to <b>50% </b>of the noise level.On the graph, I plotted the accuracy of the models trained with training sets that were fixed with different threshold values. For example, <code>100% Prune / 0% Relabel</code> indicates the accuracy of the model when all erroneous samples and their labels were deleted, while <code>0% Prune / 100% Relabel</code> shows the accuracy of the model when all of the samples were left but relabelled.Looking at the graph, I can say that <code>cleanlab</code> definitely does a great job at <i>identifying</i> labels, but not necessarily at <i>fixing them automatically</i>. As soon as I increase the % of labels that I’d like to relabel, the accuracy starts to go down in linear way. The training set with <b>100%</b> of pruning got the highest accuracy, while the training set with all labels relabelled got the worst accuracy on the fixed model.As a next step, I can try to see what happens if we only remove a certain % of erroneous samples, but leave the labels of the other erroneous samples as they are. (edited) </p> <p><img alt="" src="https://i.postimg.cc/Sx9vWyzW/newplot-24.png"></p> <p>I also run this pipeline on the <a href="https://https-deeplearning-ai.github.io/data-centric-comp/">Roman Dataset (DCAI)</a>. This dataset is quite noisy, so there’s not a ton of improvement on 0-10% of the noise level. However, as I introduce more noisy labels, looks like <code>cleanlab</code> is still able to pick them up. Running a few more trials to see what’s the performance with different Prune vs Relabel threshold. </p> <p><img alt="" height="600" src="https://i.postimg.cc/L6cy4r0h/newplot-25.png" width="1000"></p> <p><img alt="" src="https://i.postimg.cc/mr8N9qVY/newplot-26.png"></p> <p><img alt="" src="https://i.postimg.cc/C1RPxSYB/newplot-27.png"></p>danielgareev@gmail.com (lowlypalace)Sun, 11 Sep 2022 10:02:06 +0000https://blogs.python-gsoc.org/en/lowlypalaces-blog/gsoc-blog-activeloop-week-7-1/GSoC Blog | Activeloop | Week 7https://blogs.python-gsoc.org/en/lowlypalaces-blog/gsoc-blog-activeloop-week-7/<p>This week, I worked on running the experiments on a variety of datasets. </p> <p>Here are a few exeriments that I did to benchmark <code>cleanlab</code> performance of three different datasets (<a href="https://https-deeplearning-ai.github.io/data-centric-comp/">MNIST Roman</a> (DCAI), <a href="https://www.notion.so/Dataset-Optimization-a658f8d56d0b4ba5b06cd7e8d8719b33">Flower 102</a>, <a href="https://www.notion.so/Dataset-Optimization-a658f8d56d0b4ba5b06cd7e8d8719b33">Fashion MNIST</a>). For all experiments, I used a fixed model and applied resizing and normalization to the original images.</p> <p>What do 1️⃣, 2️⃣, 3️⃣, 4️⃣ mean? I re-run the entire fitting with a a few random seeds to get an estimate of the variance between the accuracies of <code>baseline</code> and <code>cleanlab</code>.</p> <blockquote> <p>1️⃣  = <code>seed(0)</code></p> <p>2️⃣  = <code>seed(1)</code></p> <p>3️⃣  = <code>seed(123)</code></p> <p>4️⃣ = <code>seed(42)</code></p> </blockquote> <h2>First Training Run</h2> <table> <tbody> <tr> <td> </td> <td>Roman MNIST (DCAI)</td> <td>Flower 102</td> <td>Fashion MNIST</td> <td>KMNIST</td> <td> </td> </tr> <tr> <td>&lt;mark&gt;<strong>Baseline Accuracy</strong>&lt;/mark&gt;</td> <td> <p>1️⃣ <strong>0.7945</strong> </p> <p>2️⃣ <strong>0.8511</strong> </p> <p>3️⃣ <strong>0.7699</strong></p> </td> <td> <p>1️⃣ <strong>0.6568</strong> </p> <p>2️⃣ <strong>0.6176</strong> </p> <p>3️⃣ <strong>0.6274</strong></p> </td> <td> <p>1️⃣ <strong>0.8958</strong> </p> <p>2️⃣ <strong>0.8944</strong></p> <p> 3️⃣ <strong>0.8987</strong></p> </td> <td> </td> <td> </td> </tr> <tr> <td>&lt;mark&gt;<strong>+ Cleanlab Accuracy</strong>&lt;/mark&gt;</td> <td> <p>1️⃣ <strong>0.7933 → </strong>-0.0012 ⬇️</p> <p>2️⃣ <strong>0.8031 → </strong>-0.048 ⬇️</p> <p>3️⃣ <strong>0.7109 → </strong>-0.059 ⬇️</p> </td> <td> <p>1️⃣ <strong>0.5421 → -</strong>0.1147 ⬇️</p> <p>2️⃣ <strong>0.5441 → </strong>-0.0735 ⬇️</p> <p>3️⃣ <strong>0.5647 → </strong>-0.0627 ⬇️</p> </td> <td> <p>1️⃣ <strong>0.8992 → </strong>0.0034 ⬆️</p> <p>2️⃣ <strong>0.8951 →</strong> 0.0007 ⬆️</p> <p>3️⃣ <strong>0.8866 → </strong>-0.0121 ⬇️</p> </td> <td> </td> <td> </td> </tr> <tr> <td>Parameters</td> <td><code>batch_size = 8 model = resnet18() train_shuffle = True test_shuffle = False train_split = None epochs = 10</code></td> <td><code>batch_size = 16 model = resnet18() train_shuffle = True test_shuffle = False train_split = None epochs = 10</code></td> <td><code>batch_size = 32 model = resnet18() train_shuffle = True test_shuffle = False train_split = None epochs = 10</code></td> <td> </td> <td> </td> </tr> <tr> <td>Transform</td> <td><code>Resize((224, 224)), ToTensor(), Normalize( [0.485, 0.456, 0.406], [0.229, 0.224, 0.225] )</code></td> <td><code>Resize((224, 224)), ToTensor(), Normalize( [0.485, 0.456, 0.406], [0.229, 0.224, 0.225] )</code></td> <td><code>ToTensor(), Normalize((0.), (1.))</code></td> <td><code>ToTensor(), Normalize((0.), (1.))</code></td> <td><code>Resize((300, 300)), ToTensor(), Normalize( [0.485, 0.456, 0.406], [0.229, 0.224, 0.225] )</code></td> </tr> <tr> <td>Network</td> <td><code>resnet = models.resnet18() resnet.fc = nn.Linear(resnet.fc.in_features, 10)</code></td> <td><code>resnet = models.resnet18() resnet.fc = nn.Linear(resnet.fc.in_features, 102)</code></td> <td><code>resnet = models.resnet18() resnet.conv1 = nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3, bias=False)</code> <code>resnet.fc = nn.Linear(resnet.fc.in_features, 10)</code></td> <td><code>resnet = models.resnet18() resnet.conv1 = nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3, bias=False)</code> <code>resnet.fc = nn.Linear(resnet.fc.in_features, 10)</code></td> <td><code>resnet = models.resnet18() resnet.fc = nn.Linear(resnet.fc.in_features, 47)</code></td> </tr> <tr> <td>Parameters</td> <td><code>batch_size = 8 model = resnet18() train_shuffle = True test_shuffle = False train_split = None</code> <code>epochs = 10</code></td> <td><code>batch_size = 16 model = resnet18() train_shuffle = True test_shuffle = False train_split = None epochs = 10</code></td> <td><code>batch_size = 32 model = resnet18() train_shuffle = True test_shuffle = False train_split = None epochs = 10</code></td> <td><code>batch_size = 64 model = resnet18() train_shuffle = True test_shuffle = False train_split = None epochs = 10</code></td> <td><code>batch_size = 32 model = resnet18() train_shuffle = True test_shuffle = False train_split = None epochs = 10</code></td> </tr> <tr> <td>Number of Classes</td> <td>10</td> <td>102</td> <td>10</td> <td>10</td> <td>47</td> </tr> <tr> <td>Images Dimension</td> <td>224x224</td> <td>224x224</td> <td>28x28</td> <td>28x28</td> <td>300x300</td> </tr> </tbody> </table> <h2>Training with 20 Epochs</h2> <p>In the results below, I used <code>epochs = 20</code> instead of <code>epochs = 10</code>. The rest of the parameters (e.g. Network, Transform) are unchanged.</p> <table> <tbody> <tr> <td> </td> <td>MNIST Roman</td> <td>Flower 102</td> <td>MNIST Fashion</td> <td>KMNIST</td> <td>Describable Textures Dataset</td> </tr> <tr> <td>&lt;mark&gt;<strong>Baseline Accuracy</strong>&lt;/mark&gt;</td> <td> <p>1️⃣ <strong>0.6875</strong></p> <p> 2️⃣ <strong>0.7736</strong> </p> <p>3️⃣ <strong>0.6617 </strong></p> <p><strong>4️⃣ 0.7945 </strong></p> <p>Mean = 0.7293</p> </td> <td> <p>1️⃣ <strong>0.5421 </strong></p> <p>2️⃣ <strong>0.5617</strong> </p> <p>3️⃣ <strong>0.6578</strong> </p> <p>4️⃣ <strong>0.6294</strong></p> </td> <td> <p>1️⃣ <strong>0.891</strong> </p> <p>2️⃣ <strong>0.8977</strong></p> <p>3️⃣ 0.8977</p> </td> <td> </td> <td> </td> </tr> <tr> <td>&lt;mark&gt;<strong>+ Cleanlab Accuracy</strong>&lt;/mark&gt;</td> <td> <p>1️⃣ <strong>0.7257</strong> <strong>→</strong> 0.0382 ⬆️</p> <p>2️⃣ <strong>0.8400</strong> <strong>→</strong> 0.0664 ⬆️</p> <p><strong> </strong>3️⃣ <strong>0.8511</strong> <strong>→</strong> 0.1894 ⬆️</p> <p>4️⃣ <strong>0.8757</strong> <strong>→</strong> 0.0812</p> <p>⬆️ Mean = 0.8231</p> <p>Mean Difference = <strong>0.0938</strong> ⬆️</p> </td> <td> <p>1️⃣ <strong>0.6117 → </strong>0.0696 ⬆️</p> <p>2️⃣<strong> 0.6254</strong> <strong>→ </strong>0.0833 ⬆️</p> <p>3️⃣ <strong>0.5598</strong> <strong>→ </strong>-0.098 ⬇️</p> <p>4️⃣ <strong>0.5705</strong> <strong>→ </strong>-0.0589 ⬇️</p> </td> <td> <p>1️⃣ <strong>0.8982</strong> </p> <p>3️⃣ 0.897</p> </td> <td> </td> <td> </td> </tr> <tr> <td>Parameters</td> <td><code>batch_size = 8 model = resnet18() train_shuffle = True test_shuffle = False train_split = None</code> <code>epochs = 20</code></td> <td><code>batch_size = 16 model = resnet18() train_shuffle = True test_shuffle = False train_split = None</code> <code>epochs = 20</code></td> <td><code>batch_size = 32 model = resnet18() train_shuffle = True test_shuffle = False train_split = None</code> <code>epochs = 20</code></td> <td><code>batch_size = 64 model = resnet18() train_shuffle = True test_shuffle = False train_split = None</code> <code>epochs = 20</code></td> </tr> </tbody> </table> <h2>Training with 30 Epochs</h2> <p>In the results below, I used <code>epochs = 30</code> instead of <code>epochs = 20</code>. The rest of the parameters (e.g. Network, Transform) are unchanged.</p> <table> <tbody> <tr> </tr> <tr> <td>MNIST Roman</td> <td>Flower 102</td> <td>MNIST Fashion</td> <td>KMNIST</td> <td>Describable Textures Dataset</td> </tr> <tr> <td>&lt;mark&gt;<strong>Baseline Accuracy</strong>&lt;/mark&gt;</td> <td> <p>0.8425</p> <p>0.8831</p> <p>0.8769</p> <p>0.8560</p> <p>Mean = 0.8646</p> </td> <td> </td> <td> </td> <td> </td> <td> </td> </tr> <tr> <td>&lt;mark&gt;<strong>+ Cleanlab Accuracy</strong>&lt;/mark&gt;</td> <td> <p>0.8646 → 0.0221</p> <p>0.8228 → -0.0602</p> <p>0.8696 → -0.0073</p> <p>0.8720 → 0.0159</p> <p>Mean = 0.8573</p> <p>Mean Difference = 0.0073</p> </td> <td> </td> <td> </td> <td> </td> <td> </td> </tr> <tr> <td>Parameters</td> <td><code>batch_size = 8 model = resnet18() train_shuffle = True test_shuffle = False train_split = None</code> <code>epochs = 30</code></td> <td><code>batch_size = 16 model = resnet18() train_shuffle = True test_shuffle = False train_split = None</code> <code>epochs = 30</code></td> <td><code>batch_size = 32 model = resnet18() train_shuffle = True test_shuffle = False train_split = None</code> <code>epochs = 30</code></td> <td><code>batch_size = 64 model = resnet18() train_shuffle = True test_shuffle = False train_split = None</code> <code>epochs = 30</code></td> </tr> </tbody> </table> <p> </p> <h2>Training with Resnet50</h2> <p>In the results below, all of the parameters stay the same as in run above, but this time I changed the network to <code>resnet50()</code> instead of <code>resnet18()</code>. Epochs are also back to <code>epochs = 20</code>.</p> <table> <tbody> <tr> <td> </td> <td>MNIST Roman</td> <td>Flower 102</td> <td>MNIST Fashion</td> <td> </td> <td>Describable Textures Dataset</td> </tr> <tr> <td>&lt;mark&gt;<strong>Baseline Accuracy</strong>&lt;/mark&gt;</td> <td> <p>1️⃣ <strong>0.7835</strong> </p> <p>2️⃣ <strong>0.7589</strong> </p> <p>3️⃣ <strong>0.8068</strong> </p> <p>4️⃣ <strong>0.8560</strong></p> </td> <td> </td> <td> </td> <td> </td> <td> </td> </tr> <tr> <td>&lt;mark&gt;<strong>+ Cleanlab Accuracy</strong>&lt;/mark&gt;</td> <td> <p>1️⃣ <strong>0.8044</strong></p> <p> 2️⃣ <strong>0.8560</strong> </p> <p>3️⃣ <strong>0.8376</strong> </p> <p>4️⃣ <strong>0.7859</strong></p> </td> <td> </td> <td> </td> <td> </td> <td> </td> </tr> <tr> <td>Parameters</td> <td><code>batch_size = 8 model = resnet50() train_shuffle = True test_shuffle = False train_split = None</code> <code>epochs = 20</code></td> <td><code>batch_size = 16 model = resnet50() train_shuffle = True test_shuffle = False train_split = None</code> <code>epochs = 20</code></td> <td><code>batch_size = 32 model = resnet50() train_shuffle = True test_shuffle = False train_split = None</code> <code>epochs = 20</code></td> <td><code>batch_size = 64 model = resnet50() train_shuffle = True test_shuffle = False train_split = None</code> <code>epochs = 20</code></td> <td><code>batch_size = 8 model = resnet50() train_shuffle = True test_shuffle = False train_split = None</code> <code>epochs = 20</code></td> </tr> </tbody> </table> <h2>Training with Validation Set</h2> <p>In the results below, all of the parameters stay the same as in run above, however, in this run I tried to run trainings with validation set. Therefore, 20% of the dataset is used for the internal training validation. In other datasets where the validation set exist, it’s used as a validation set. I set the model back to <code>resnet18()</code> as it seems it gives better baseline accuracies over three datasets.</p> <table> <tbody> <tr> </tr> <tr> <td>MNIST Roman</td> <td>Flower 102</td> <td>MNIST Fashion</td> <td> </td> <td>Describable Textures Dataset</td> </tr> <tr> <td>&lt;mark&gt;<strong>Baseline Accuracy</strong>&lt;/mark&gt;</td> <td> <p>1️⃣ <strong>0.6432</strong> </p> <p>2️⃣ <strong>0.6445</strong> </p> <p>3️⃣ <strong>0.6777 </strong></p> <p>4️⃣ <strong>0.6383 </strong></p> <p>Mean = 0.6509</p> </td> <td> <p>0.6421</p> <p>0.6578</p> <p>0.6823</p> <p>0.6372</p> <p>Mean = 0.6549</p> </td> <td> <p>0.8974</p> <p>0.891</p> <p>0.8906</p> <p>0.894</p> <p>Mean = 0.89325</p> </td> <td> <p>0.9425</p> <p>0.9439</p> <p>0.9436</p> <p>0.9364</p> <p>Mean = 0.9416</p> </td> <td> </td> </tr> <tr> <td>&lt;mark&gt;<strong>+ Cleanlab Accuracy</strong>&lt;/mark&gt;</td> <td> <p>1️⃣ <strong>0.5584</strong> </p> <p>2️⃣ <strong>0.5879</strong></p> <p>3️⃣ <strong>0.6076 </strong></p> <p>4️⃣ <strong>0.4587 </strong>Mean = 0.5531</p> </td> <td> </td> <td> </td> <td> </td> <td> </td> </tr> <tr> <td>Parameters</td> <td><code>batch_size = 8 model = resnet18() train_shuffle = True test_shuffle = False</code> <code>train_split=ValidSplit(cv=5, stratified=False) valid_shuffle = False</code> <code>epochs = 20</code></td> <td><code>batch_size = 16 model = resnet18() train_shuffle = True test_shuffle = False</code> <code>train_split=predefined_split(Dataset(valid_data, valid_labels)) valid_shuffle = False</code> <code>epochs = 20</code></td> <td><code>batch_size = 32 model = resnet18() train_shuffle = True test_shuffle = False</code> <code>train_split=ValidSplit(cv=5, stratified=False) valid_shuffle = False</code> <code>epochs = 20</code></td> <td><code>batch_size = 64 model = resnet18() train_shuffle = True test_shuffle = False</code> <code>train_split=ValidSplit(cv=5, stratified=False)</code> <code>valid_shuffle = False</code> <code>epochs = 20</code></td> </tr> </tbody> </table> <p> </p> <h2>Training with Validation Set (Stratified Sampling)</h2> <p>Using arbitrary random seed can result in large differences between the training and validation set distributions. These differences can have unintended downstream consequences in the modeling process. As an example, the proportion of digit X can much higher in the training set than in the validation set. To overcome this, I’m using stratified sampling (sampling from each class with equal probability) to create the validation set for the datasets where it’s not available by default (e.g. MNIST Roman, MNIST Fashion, KMNIST).</p> <table> <tbody> <tr> <td> </td> <td>MNIST Roman</td> <td>Flower 102</td> <td>MNIST Fashion</td> <td> </td> <td>Describable Textures Dataset</td> </tr> <tr> <td>&lt;mark&gt;<strong>Baseline Accuracy</strong>&lt;/mark&gt;</td> <td> <p>0.7515</p> <p>0.6998</p> <p>0.7958</p> <p>0.8610</p> <p>Mean = 0.7770</p> </td> <td>N/A</td> <td> <p>0.8928</p> <p>0.8948</p> <p>0.8969</p> <p>0.895</p> <p>Mean = 0.894875</p> </td> <td> </td> <td>N/A</td> </tr> <tr> <td>&lt;mark&gt;<strong>+ Cleanlab Accuracy</strong>&lt;/mark&gt;</td> <td> <p>0.6027 → -0.1488</p> <p>0.8228 → 0.1230</p> <p>0.8130 → 0.0172</p> <p>0.6900 → -0.1709</p> <p>Mean = 0.7321</p> <p>Mean Difference = -0.0448</p> </td> <td> </td> <td> </td> <td> </td> <td> </td> </tr> <tr> <td>Parameters</td> <td><code>batch_size = 8 model = resnet18() train_shuffle = True test_shuffle = False</code> <code>train_split=ValidSplit(cv=5, stratified=True) valid_shuffle = False</code> <code>epochs = 20</code></td> <td> </td> <td><code>batch_size = 32 model = resnet18() train_shuffle = True test_shuffle = False</code> <code>train_split=ValidSplit(cv=5, stratified=True) valid_shuffle = False</code> <code>epochs = 20</code></td> <td><code>batch_size = 64 model = resnet18() train_shuffle = True test_shuffle = False</code> <code>train_split=ValidSplit(cv=5, stratified=True)</code> <code>valid_shuffle = False</code> <code>epochs = 20</code></td> </tr> </tbody> </table> <h2>Training with Early Stopping</h2> <table> <tbody> <tr> <td> </td> <td>MNIST Roman</td> <td>Flower 102</td> <td>MNIST Fashion</td> <td> </td> <td>Describable Textures Dataset</td> </tr> <tr> <td>&lt;mark&gt;<strong>Baseline Accuracy</strong>&lt;/mark&gt;</td> <td> <p>0.8327</p> <p>0.8388</p> <p>0.7921</p> <p>0.8597</p> <p>Mean = 0.8308</p> </td> <td> <p>0.6705</p> <p>0.6578</p> <p>0.6558</p> <p>0.6313</p> <p>Mean = 0.6539</p> </td> <td> <p>0.8856,</p> <p>0.8916,</p> <p>0.8856,</p> <p>0.8917</p> <p>Mean = 0.88862</p> </td> <td> </td> <td> </td> </tr> <tr> <td>&lt;mark&gt;<strong>+ Cleanlab Accuracy</strong>&lt;/mark&gt;</td> <td> <p>0.8683</p> <p>0.8339</p> <p>0.8597</p> <p>0.8105</p> <p>Mean = 0.8431</p> </td> <td> <p>0.6539</p> <p>0.6176</p> </td> <td> <p>0.887</p> <p>0.8904</p> </td> <td> </td> <td> </td> </tr> <tr> <td>Parameters</td> <td><code>callbacks=[EarlyStopping(monitor='train_loss', patience=5)]</code></td> <td><code>callbacks=[EarlyStopping(monitor='train_loss', patience=5)]</code> <code>train_split=predefined_split(Dataset(valid_data, valid_labels))</code></td> <td> </td> <td> </td> </tr> </tbody> </table> <p>Notebooks to reproduce results:</p> <ul> <li> <p><a href="https://www.notion.so/Dataset-Optimization-a658f8d56d0b4ba5b06cd7e8d8719b33">Roman MNIST</a></p> </li> <li> <p><a href="https://colab.research.google.com/drive/1pse7vnwKYMbZ0ahwZt68hxPBYp4zF4ec?usp=sharing"><strong>Flower 102</strong></a></p> </li> <li> <p><a href="https://colab.research.google.com/drive/1H4L5Ynch4IA2Cdu28bfTR0FO-qHCCnMF?usp=sharing"><strong>Fashion MNIST</strong></a></p> </li> </ul>danielgareev@gmail.com (lowlypalace)Sun, 11 Sep 2022 09:50:57 +0000https://blogs.python-gsoc.org/en/lowlypalaces-blog/gsoc-blog-activeloop-week-7/GSoC Blog | Activeloop | Week 6https://blogs.python-gsoc.org/en/lowlypalaces-blog/gsoc-blog-activeloop-week-6/<h2>What did you do this week?</h2> <p>This week, I've set up a pipeline for running experiments on a variety of datasets to compare the accuracies of baseline (the model trained on dirty data) and cleanlab (the model trained on clean data).</p> <h2>What is coming up next?</h2> <p>As a next step, I will be running experiments on a variety of datasets to benchmark the accuracy.</p> <h2>Did you get stuck anywhere?</h2> <p>Not really, it was a bit tricky to set up a pipeline in a way that's reproducible. I managed to overcome this by fixing the seeds.</p>danielgareev@gmail.com (lowlypalace)Sun, 11 Sep 2022 09:43:49 +0000https://blogs.python-gsoc.org/en/lowlypalaces-blog/gsoc-blog-activeloop-week-6/GSoC Blog | Activeloop | Week 5https://blogs.python-gsoc.org/en/lowlypalaces-blog/gsoc-blog-activeloop-week-5/<p>This week, I've been focusing on the high-level overview of data-centric strategies. I used <a href="https://worksheets.codalab.org/worksheets/0x7a8721f11e61436e93ac8f76da83f0e6">Roman MNIST Dataset</a> that was provided in the <a href="https://https-deeplearning-ai.github.io/data-centric-comp/">Data-Centric AI Competition</a>. For all experiments, I used fixed Resnet50 and applied resizing and normalization to the original dataset. I also used a fixed seed to be able to replicate the runs. Here are some of the metrics I’ve been getting:</p> <ol> <li><strong>Fixing Labels with Cleanlab.</strong></li> </ol> <ul> <li>Baseline Accuracy: 0.7318</li> <li>+ <code>CleanLearning</code> Accuracy: 0.8290</li> </ul> <ol> <li><strong>Automatic Augmentations with <a href="https://albumentations.ai/docs/autoalbument/">AutoAlbument</a> (uses <a href="https://arxiv.org/abs/1911.06987">Faster AutoAugment</a> algorithm)</strong></li> </ol> <ul> <li>Baseline Accuracy: 0.7318</li> <li>+ <code>AutoAlbument</code> Augmentations: 0.7404</li> </ul> <ol> <li><strong>Augmentations with Basic Augmentations and <a href="https://pytorch.org/vision/main/transforms.html#automatic-augmentation-transforms">Pre-trained Torch Policies</a></strong></li> </ol> <ul> <li>Baseline Accuracy: 0.7318</li> <li>Basic Baseline Augmentations: 0.8696</li> <li>ImageNet Pre-Trained Policy: 0.8560</li> </ul> <p>Here are my findings for these strategies:</p> <ul> <li><strong>Fixing Labels with Cleanlab.</strong> <ul> <li>I’m now trying out smaller <code>k</code> values for cross-validation to see to which extent it impacts the accuracy and improves the training speed. I will also try out more epoch ranges to see if this has impact on the accuracy. I’ve noticed that <code>cleanlab</code> performs well only when we already have a robust baseline model. For this specific dataset, I’ve noticed that if the accuracy of the baseline model is less than 0.7, then <code>cleanlab</code> actually has negative effect on the accuracy. I believe this is because of the confident learning algorithm, as it needs to get as accurate confidence scores for each label as possible.</li> <li>For now, the labels fixing and augmentations are applied separately. I think it would be also interesting to see how the accuracy changes after applying <code>CleanLearning</code> and then <code>AutoAlbument</code>. But I haven’t found out an easy way to get the corrected labels from CleanLearning to overwrite the initial labels of the dataset. I’ve messaged the <code>cleanlab</code> team to get their help on this.</li> </ul> </li> <li><strong>Automatic Augmentations with <a href="https://albumentations.ai/docs/autoalbument/">AutoAlbument</a></strong>. <ul> <li>I’ve only used <code>epochs = 15</code> to find optimal augmentation policy. I’ll try out more epochs ranges to see if this can improve the accuracy.</li> <li>I have also used the whole train and validation datasets for finding the optimal augmentation policy. I then applied augmentations only to train dataset and validated it on the test dataset.</li> </ul> </li> <li><strong>Augmentations with Basic Augmentations and <a href="https://pytorch.org/vision/main/transforms.html#automatic-augmentation-transforms">Pre-trained Torch Policies</a></strong> <ul> <li>I was surprised by the accuracy improvement after applying basic transformations, such as <code>HorizontalFlip()</code>, <code>RandomCrop()</code>, and <code>RandomErasing().</code></li> <li>I only applied augmentations to the train dataset.</li> </ul> </li> </ul>danielgareev@gmail.com (lowlypalace)Tue, 19 Jul 2022 13:48:34 +0000https://blogs.python-gsoc.org/en/lowlypalaces-blog/gsoc-blog-activeloop-week-5/GSoC Blog | Activeloop | Week 4https://blogs.python-gsoc.org/en/lowlypalaces-blog/gsoc-blog-activeloop-week-4/<h2>What did you do this week?</h2> <p>This week, I've been focusing on implementing custom cross-validation algorithm to compute out-of-sample probabilities for Hub Datasets.</p> <h2>What is coming up next?</h2> <p>As a next step, I will be working on a generic high-level pipeline that consists of fixing labels on a particular dataset, applying augmentations to a dataset and finding slices that underperform on a particular dataset.</p> <h2>Did you get stuck anywhere?</h2> <p>Not particularly, I had a few issues with ensuring that my experiments are deterministic with PyTorch but at the end I was able to fix it.</p>danielgareev@gmail.com (lowlypalace)Mon, 18 Jul 2022 09:02:55 +0000https://blogs.python-gsoc.org/en/lowlypalaces-blog/gsoc-blog-activeloop-week-4/GSoC Blog | Activeloop | Week 3https://blogs.python-gsoc.org/en/lowlypalaces-blog/gsoc-blog-activeloop-week-3/<h2>Overview</h2> <p>This week, I've been mainly working on making Hub datasets compatible with cleanlab. I implemented three tools to benchmark how cleanlab would work on the same dataset fetched from different sources.</p> <h2>Hub Dataset + Dataloader + Skorch</h2> <p>The first tool allows to run cleanlab with Hub dataset format and allows to directly pass custom Hub Dataloader. As cleanlab features leverage scikit-learn compatibility, I wrap the PyTorch neural net using skorch, which makes it scikit-learn-compatible. However, I had to overwrite a few of the methods such as get_dataset, <code>get_iterator</code>, <code>train_step_single</code>, <code>evaluation_step</code> and <code>validation_step</code> in the generic <code>NeuralNet</code> class to make Hub datasets work with skorch.</p> <h2>Pytorch Dataset + Pytorch Dataloader + Skorch</h2> <p>The second tool fetches the same data from torch.datasets, however, this time I didn't need to overwrite any scorch <code>NeuralNet </code>methods as they support standard PyTorch datasets and Dataloader by default. This step was mainly to ensure that I'm handling the Hub dataset format properly and to compare that the metrics for training match the one with the Hub dataset format.</p> <h2>Computing Out-of-sample Probabilities with Cross Validation</h2> <p>The third tool works with Hub datasets, however, it doesn't use skorch. Instead, this tool computes out-of-sample probabilities using cross validation for Hub dataset format. As skorch doesn't include functionality for cross-validation that's required by cleanlab, this week I focused on implementing cross-validation from scratch.</p>danielgareev@gmail.com (lowlypalace)Sun, 17 Jul 2022 22:27:16 +0000https://blogs.python-gsoc.org/en/lowlypalaces-blog/gsoc-blog-activeloop-week-3/GSoC Blog | Activeloop | Week 2https://blogs.python-gsoc.org/en/lowlypalaces-blog/gsoc-blog-activeloop-week-2/<h2>What did you do this week?</h2> <p>This week, I've been focusing on running experiments with automatic dataset augmentations as well as labels fixing.</p> <p>As a first experiment, I have used cleanlab to automatically find label issues in MNIST dataset. As I'm using an open-source tool cleanlab, it requires to get out-of-sample probabilities for each sample in the dataset. For that, I had to use cross-validation. There are two main ways to implement cross validation for neural networks: 1) wrap the model into sklearn-compatible model and use cross validation from scikit-learn, 2) implement your own cross validation algorithm and extract probabilities from each fold. I have been experimenting with both of the ways. First, I tried to use skorch Python library that wraps PyTorch model into a sklearn-compatible model. I had to overwrite a few methods to make it compatible with Hub datasets. <br> <br> As a second experiment, I have implemented data augmentation pipeline with Pre-Trained Policies in Torchvision as well as compared them to the baseline and plain approaches.</p> <ul> <li><strong>Plain</strong> — only <code>Normalize()</code> operation is applied.</li> <li><strong>Baseline</strong> — combination of <code>HorizontalFlip()</code>, <code>RandomCrop()</code>, and <code>RandomErasing()</code> .</li> <li><strong>AutoAugment</strong> — ****policy where <code>AutoAugment</code> is an additional transformation along with the baseline configuration. <a href="https://colab.research.google.com/drive/1d0x1rDCwANnymb6JnvWVYHQ6HTfravus?usp=sharing">Augmentation Example</a> on Colab. <code>torchvision</code> provides pre-trained policies on datasets like CIFAR-10, ImageNet, or SVHN. All of these are available in <code>AutoAugemntPolicy</code> package.</li> </ul> <p>After applying data transformations, I trained the models with Fashion-MNIST dataset on 40 epochs. Below, I show the results that I've obtained.</p> <h2>Plain</h2> <pre><code>transform_train = transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,)), ]) </code></pre> <pre><code>Epoch: [40 | 40] LR: 0.100000 Processing |################################| (469/469) Data: 0.003s | Batch: 0.021s | Total: 0:00:09 | ETA: 0:00:01 | Loss: 0.1647 | top1: 94.2267 | top5: 99.9650 Processing |################################| (100/100) Data: 0.007s | Batch: 0.017s | Total: 0:00:01 | ETA: 0:00:01 | Loss: 0.2718 | top1: 90.8600 | top5: 99.8200 92.22 </code></pre> <h2>Baseline</h2> <pre><code>transform_train = transforms.Compose([ transforms.RandomHorizontalFlip(), transforms.RandomCrop(32, 4), transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,)), transforms.RandomErasing() ]) </code></pre> <pre><code>Epoch: [40 | 40] LR: 0.100000 Processing |################################| (469/469) Data: 0.023s | Batch: 0.043s | Total: 0:00:20 | ETA: 0:00:01 | Loss: 0.2602 | top1: 90.5000 | top5: 99.9117 Processing |################################| (100/100) Data: 0.008s | Batch: 0.018s | Total: 0:00:01 | ETA: 0:00:01 | Loss: 0.2430 | top1: 91.3800 | top5: 99.8700 Best acc: 91.38 </code></pre> <h2><strong>AutoAugment</strong></h2> <pre><code>transform_train = transforms.Compose([ transforms.AutoAugment(AutoAugmentPolicy.IMAGENET), transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,)), ]) </code></pre> <pre><code>Epoch: [40 | 40] LR: 0.100000 Processing |################################| (469/469) Data: 0.033s | Batch: 0.054s | Total: 0:00:25 | ETA: 0:00:01 | Loss: 0.2633 | top1: 90.4683 | top5: 99.9067 Processing |################################| (100/100) Data: 0.006s | Batch: 0.016s | Total: 0:00:01 | ETA: 0:00:01 | Loss: 0.2281 | top1: 91.6000 | top5: 99.9500 Best acc: 92.03 </code></pre> <p> </p> <h2>What is coming up next?</h2> <p>As a next step, I will be working on making cleanlab compatible with Hub datasets. Specifically, I will implement my own cross validation algorithm to obtain out-of-sample probabilities.</p> <h2>Did you get stuck anywhere?</h2> <p>I mainly had design questions on how the users will be interacting with the API, but I've communicated with the team and resolved the doubts.</p>danielgareev@gmail.com (lowlypalace)Wed, 29 Jun 2022 13:22:15 +0000https://blogs.python-gsoc.org/en/lowlypalaces-blog/gsoc-blog-activeloop-week-2/GSoC Blog | Activeloop | Week 1https://blogs.python-gsoc.org/en/lowlypalaces-blog/gsoc-blog-activeloop-week-1/<p>This week, along with the community bonding period, I had a deep dive into the codebase of the project. Along with that, the project has a strong research component. The goal of the project is to offer users a set of automatic tools that they can use to improve the overall quality of their datasets. Therefore, I focused on researching various data-centric tools (e.g. auto-augmentation, fixing labels, slice discovery) and their trade-offs. Below, I describe a few of the data-centric tools that I discovered and experimented with. </p> <h2>1. Fix Dataset</h2> <blockquote> <p>These tools focus on identify errors in datasets. These include traditional constraint-based data cleaning methods, as well as those that use machine learning to detect and resolve data errors.</p> </blockquote> <p>The labels in datasets from real-world applications can be of far lower quality. <a href="https://www.technologyreview.com/2021/04/01/1021619/ai-data-errors-warp-machine-learning-progress/">Recent studies</a> have discovered that even ML benchmark datasets are full of label errors. The goal of this step would be to use one of the open-source tools, such as <a href="https://github.com/cleanlab/cleanlab">cleanlab</a>, that automatically finds and fixes errors in any ML dataset. </p> <h2>2. Auto Augmentations</h2> <blockquote> <p>A technique to increase the diversity of your training set by applying random (but realistic) transformations, such as image rotation.</p> </blockquote> <p>Automatic augmentation is useful not only increase the accuracy, it also prevents overfitting and makes models generalize better. Transformations enlarge the dataset by adding slightly modified copies of already existing images.<br> <br> 2.1 API</p> <p>First, I came up with a high-level API to automatically augment images:</p> <p><code>ds.autoaugment(task)</code></p> <p><code>ds.autoaugment()</code> takes a <code>task</code> and a set of optional parameters and returns a set of optimal augmentation policies.</p> <p><strong>Args</strong></p> <p><code>task</code> - A name of deep learning task. Supported values are <code>classification</code> and <code>semantic_segmentation</code>.</p> <p><code>num_classes</code> - An optional parameter to provide a number of distinct classes in the classification or segmentation dataset. If not given, finds the number of classes automatically.</p> <p><code>model</code> - An optional parameter to provide a custom model. By default uses <code>[pytorch-image-model](&lt;https://github.com/rwightman/pytorch-image-models&gt;)</code> for classification and <code>[segmentation_models.pytorch](&lt;https://github.com/qubvel/segmentation_models.pytorch&gt;)</code> for semantic segmentation.</p> <p><code>preprocess</code> - An optional parameter to provide preprocessing transofrms. If images have different sizes or formats, you could define preprocessing transforms (such as Resizing, Cropping and Normalization).</p> <p><strong>Returns</strong></p> <p><code>transform</code> - A wrapper function that contains discovered policies for the augmentation pipeline. This function can be applied on a complete dataset when loading the dataset or during training.</p> <p><code>ds.autoaugment()</code> produces a transform pipeline (a configuration for an augmentation pipeline). We can augment the dataset as follows:</p> <h3><br> 2.2 Data Augmentation Approaches</h3> <h4>2.2.1 Pre-Trained Policies</h4> <ul> <li><code>PyTorch</code> provides <a href="https://pytorch.org/vision/main/transforms.html#automatic-augmentation-transforms">pre-trained augmentation transforms policies</a>. We can try to use AutoAugment policies learned on different datasets, try it on a different dataset and compare it to the baseline with or only a few basic transformations.</li> <li><a href="https://github.com/VDIGPKU/DADA#found-policy">DADA</a> provides Data Augmentation policies found for CIFAR-10, CIFAR-100, SVHN, and ImageNet datasets.</li> <li><strong>Pros</strong> <ul> <li>No time spent on finding policies, training and validation</li> <li>No input parameters needed</li> </ul> </li> <li><strong>Cons</strong> <ul> <li>The policies are not tailored for the dataset at hand</li> </ul> </li> </ul> <h4>2.2.2 Faster AA / DADA Implementation</h4> <p>Next, I researched various automatic approaches to find data augmentation policies from data.</p> <ul> <li> <p>There are many data augmentation tools that were developed in recent years.</p> </li> <li> <p><a href="https://github.com/moskomule/dda/tree/master/faster_autoaugment">Faster AA</a> / <a href="https://github.com/VDIGPKU/DADA#license">DADA</a> are a few of the newest tools (libraries) that provide a good accuracy and time trade-off.</p> </li> <li> <p>The libraries implement only basic classification tasks with common datasets (e.g. CIFAR, SVHN).</p> </li> <li> <p>Object detection and image segmentation is not supported.</p> </li> <li> <p>The libraries are research oriented.</p> </li> </ul> <p>Table below shows the training time on ImageNet for DADA, Faster AA and Deep AA. The DADA is as twice as fast as Faster AA.</p> <table style="width: 500px;"> <tbody> <tr> <td>Number of GPUs</td> <td>DADA</td> <td>Faster AA</td> <td>Deep AA</td> </tr> <tr> <td>1 GPU</td> <td>1.3</td> <td>2.3</td> <td>96</td> </tr> <tr> <td>2 GPU</td> <td>0.6</td> <td>1.1</td> <td>48</td> </tr> <tr> <td>4 GPU</td> <td>0.3</td> <td>0.5</td> <td>24</td> </tr> <tr> <td>8 GPU</td> <td>0.1</td> <td>0.2</td> <td>12</td> </tr> </tbody> </table> <p>While the accuracy of Deep AA on ImageNet with ResNet-50 is higher than DADA and Faster AA, it is considerably slower.</p> <table style="width: 500px;"> <tbody> <tr> <td>Dataset</td> <td>DADA</td> <td>Faster AA</td> <td>Deep AA</td> </tr> <tr> <td>ImageNet (ResNet-50)</td> <td>77.5</td> <td>76.5</td> <td>78.30 ± 0.14</td> </tr> <tr> <td>ImageNet (ResNet-200)</td> <td>-</td> <td>-</td> <td>81.32 ± 0.17</td> </tr> <tr> <td>CIFAR 10 (Wide-ResNet-28-10)</td> <td>97.3</td> <td>97.4</td> <td>97.56 ± 0.14</td> </tr> <tr> <td>CIFAR 100 (Wide-ResNet-28-10)</td> <td>82.5</td> <td>82.7</td> <td>84.02 ± 0.18</td> </tr> </tbody> </table> <h4>2.2.3 Albumentations</h4> <ul> <li> <p><a href="https://github.com/albumentations-team/albumentations">Albumentations</a> supports different computer vision tasks such as classification, semantic segmentation, instance segmentation, object detection, and pose estimation.</p> </li> <li> <p>For the most image operations, Albumentations is faster than all alternatives</p> </li> <li> <p><code>AutoAlbument</code> is an AutoML tool that learns image augmentation policies from data using the <a href="https://arxiv.org/abs/1911.06987">Faster AutoAugment algorithm</a></p> <ul> <li>AutoAlbument supports image classification and semantic segmentation tasks.</li> <li>Under the hood, it uses the Faster AutoAugment algorithm.</li> <li>We can use Albumentations to utilize policies discovered by AutoAlbument. <ul> </ul> </li> </ul> </li> </ul> <p><br> Besides coding, this week I jumped on another task to get to know the Hub community better. I coordinated a team of open-source contributors and allocated them some of the tasks. </p> <p> </p> <p> </p>danielgareev@gmail.com (lowlypalace)Wed, 29 Jun 2022 13:01:40 +0000https://blogs.python-gsoc.org/en/lowlypalaces-blog/gsoc-blog-activeloop-week-1/