Week 2

suhaasneel22@gmail.com (nitruspeed) — Sat, 09 Jul 2022 19:10:24 +0000

Week 2

Hello, I am Suhaas Neel. I am currently pursuing Electronics and Communications Engineering at Jawaharlal Nehru University, New Delhi. The project I am a part of aims to automate the time-Intensive process of manually searching for augmentation strategies. There has been research in automating which has even led to improvements in the results of various benchmark datasets. Thus this summer I will be focused on building infrastructure to implement these algorithms natively with Hub.

What did I do this week?

I started off by implementing some more transformations, I started off with implementing shear and translate transformations because that were the only geometrical transformation that were left. Apart from this I started implementing non-geometrical transformations. These included histogram equalization, posterizing, solarizing, inverting the image, adjusting saturation, brightness and contrast etc. These transformations are all based on open-CV and NumPy.

Apart from this I also started implementing the more robust API that was discussed with my mentor the prior week. The idea was to get more functionality and a cleaner API. This time rather than use an array of hub compute functions. I gave the user the option to use augmenter.add_steps() which would take in an augmentation pipeline and one/multiple tensors to which this augmentation pipeline is to be applied upon. This makes the API seem less clunky than asking the user to directly enter a dictionary of tensors as keys and an array of arrays as multiple augmentation pipelines.

What I plan to do next week

At this point we have all the necessary things to add this feature to hub. The only missing piece is adding support for multiprocessing.

Where I got stuck

Initially, I implemented the augmentation pipeline dictionary rather than using add_step which was suggested by my mentor. This was one thing I could have avoided had my sense of the API been more apt about what the user wants. I did not have to face many bugs this week since it was more like restructuring the previous work for multiple augmentation pipelines.

Week 1

suhaasneel22@gmail.com (nitruspeed) — Sat, 09 Jul 2022 10:07:44 +0000

Week 1

What did I do this week?

Including the community building period, I started by looking through different algorithms and their comparisons that could discover better augmentation strategies at a lower computational cost. The pioneering work in this area uses reinforcement learning which uses a lot of compute power. Compared to this there have been newer techniques that achieve similar increase in accuracies using much less compute. This includes algorithms like faster augmentation and DADA. These two repositories also have licensing that enables us to use them in our project. Another algorithm called deep auto-augment gives better results than these two but at a higher computational cost. Despite the higher cost this method is still worth exploring because even a little improvement can be much more important than 100 GPU hours.

Apart from this I also started working on the augmentation API and read the codebases of albumentations and pytorch to check up on how they have implemented the augmentation API. Also I started working on implementing augmentations for hub datasets. Given this project was pretty open ended, given the variety of algorithms we could use to auto-augment datasets, It will surely help having a robust augmentation implementation that could work easily with different policies we would need to work with. The higher level API takes in either a hub dataset or a dataloader and an augmentation pipeline ( an array of transformations along with their parameter). This returns a generator of the same size as the input dataset/dataloader with images augmented according to the specified pipeline. To test this function I also implemented some image transformations, which included basic transformations like scaling, rotating and flipping images.

What I plan to do next week

This week my mentor mentioned the need of a more general API that could handle multiple tensors and multiple augmentation pipelines for the same tensor. Also I wanted to build some more transformations, after which I planned to get started with implementing experimenting with different auto-augmentation algorithms, although looking at the comparisons provided in the paper was reliable enough but for some algorithms the authors had open-sourced only a simplified version of the original. If we consider these algorithms, we could use some experimentation.

Where I got stuck

At the time of working out API, I was confused whether to start with the current implementation of transformations for hub which could only return one sample per sample used. This was because, it was using pytorch dataloader API under the hood which could. Hence I finally started off with making a fresh implementation. Apart from this I faced a few bugs which just needed a little time to resolve.

Articles on nitruspeed's Blog

Week 2

Week 2

Week 1

Week 1