Weekly check-in #4 (week 3): 10/06 to 16/06

tomasb
Published: 06/17/2019

Hi! It’s almost a month into GSoC now.

What did you do this week?

In week 3 I submitted a Work-in Progress Pull Request (https://github.com/TeamHG-Memex/eli5/pull/315) for explaining Keras image classifiers. The rest of the week was making changes to this PR. One of the mentors added plenty of comments to the it so that kept me busy. I have changed many things: the tutorial, docstrings, exception handling, the API (function signatures). It’s maintenance time!

 

What is coming up next?

Finishing up the PR soon is definitely a priority, so that I can work on other things. There are a few issues left to resolve such as optimisation and managing optional dependencies for images. I need a few more tests for coverage to look good as well!

 

After that, I hope to get started with text and machine learning. The mentors shared some resources related to that, such as this book and tutorials from Tensorflow and Keras docs. First I will need to get familiar with the area, and only then start applying Grad-CAM.

Did you get stuck anywhere?

I was a bit stuck getting my docs build to work. In our Sphinx automatic documentation builder, when mocking external libraries I had to declare submodules, not just the top level modules, that I have used. The syntax for docstrings was weird too.

 

When making the tutorial I spent way too much time trying to find a unique picture from ImageNet. The site was often slow and after giving up I returned to the good old ‘cat_dog.jpeg’.

 

I wanted to work on new features this week, but I was always on the PR, so that was a blocker!


 

We have a week left before the first evaluation. Thanks for reading and keep up the work!

Tomas Baltrunas

View Blog Post

Weekly blog #1 (week 2): 03/06 to 09/06

tomasb
Published: 06/10/2019

As per the PSF calendar, for this week I will try to write a blog post instead of the usual check-in post. I will be answering three questions: what I am working on, what I struggled with, and what solutions I have come to.

 

First, a recap of week 1.

 

I started working with Grad-CAM for Keras and images. Just to explain that, say we have neural network that takes in an input such as an image, and gives an output, for example a category that tells you what is in the image. By using Grad-CAM, we can highlight the pixels in the image that helped the network decide on the category that it picked. We can check where the network “looks”.

 

Right away I struggled with implementing such “explanations”. Like any respectable student does I found a GitHub repo that contained all the work I needed to do and copy pasted it real fast. In the end this worked well, but my approach was not good. I started out by adding code function-by-function, making “optimisations” as I saw fit.  Unfortunately I could not check if I have made any errors, and ended up with some exceptions that I could not resolve.

 

The solution to this was testing, then making small changes. It’s hard to take something that does not work and make changes to it. It’s much easier to take what works and change it, then check that nothing broke. I thank the mentors for advising me this.

 

Going forward to week 2, I added automated tests for what I have done in week 1, and as per a sync up call with a mentor made some optimisations to the Grad-CAM implementation.

 

There were a couple of issues I ran into. Firstly, testing image output is a problem in itself. There were only a few comments on this online, but I talked to my mentors and we agreed that doing a rough check (checking average values in a region) would be good (pixel-by-pixel checks are too fragile). This led to some “integration” tests.

 

Next, I found it hard to come up with a few “unit tests”. I clarified the API with the mentors and changed some function signatures, and that helped.

 

I think these were the main “issues” of week 2. It went by fast and I am much happier with the code now. Looking forward to learning about RNN’s and adapting Grad-CAM next week! But first I have to write some docs :(

 

See you next week,

Tomas Baltrunas

View Blog Post

Weekly check-in #3 (week 1): 27/05 to 02/06

tomasb
Published: 06/03/2019

Hey! So we are done with week 1 ...

What did you do this week?

Finishing this week, I can now feed Keras models and images to ELI5 and make it show visualisations of them. I have wrapped a working Grad-CAM implementation I found on GitHub, and made changes to it such as ability to choose a layer and prediction to do Grad-CAM on. Grad-CAM produces a “heatmap”, so I have added an “image formatter” that takes that heatmap and overlays it over the original image. During the week I caught up with my mentors, going through the Grad-CAM paper, and called with some Scrapy students to get to know each other.

What is coming up next?

Week 2 will be a split between two activities. First, I will do testing. It will be essential to add some automated tests using PyTest and tox. Some manual tests using different datasets and models would also be nice. Secondly, I will need to perform optimisations, refactorings, and improvements to the code added in week 1. One interesting task will be making Grad-CAM work beyond classification-based models, i.e. regression, etc. I hope that by the end of week 2 or 3 my branch will be in a good shape for a Pull Request.

Did you get stuck anywhere?

I had troubles with implementing/wrapping Grad-CAM itself (more on that in the upcoming weekly blog, but testing what works THEN changing things, and mentors’ advice, had certainly helped).  A slight blocker was also low responsiveness of my machine when testing with large models in memory. Fortunately smaller models exist so I don’t always have to use VGG16!

 

1 down and 11 more to go! Thanks for reading!

Tomas Baltrunas

View Blog Post

Weekly check-in #2: 21/05 to 26/05 (community bonding)

tomasb
Published: 06/03/2019

Hi! This is a check-in for the last week of community bonding. I am writing this at the end of week 1, so I hope I still remember most of what I did!

What did you do this week?

Briefly tested out Tox and Sphinx on my machine. Made sure I can run all the environments specified in ELI5’s tox.ini: I had to apt-get some python-dev packages, and comment out the installation for xgboost 0.6a2 due to installation issues on Ubuntu. Set up a few virtualenv’s. Installed PyCharm IDE. Git clone’d existing Grad-CAM implementations for Keras and checked that they work using examples from ImageNet. Commented the source code by stepping through it with pdb. Looked at the authors’ implementation of Grad-CAM in Lua.

What is coming up next?

Week 1 tasks will begin! As scheduled I will implement Grad-CAM for images (working with Keras models). One change will be that I will make automated tests during week 2, not week 1. Instead of testing this week, I will add an image formatter so that I can get immediate feedback of the Grad-CAM algorithm implemented. There were some things that I planned to do during this week, but prioritised them for later, including: Enabling my GPU to use with ML libraries, learning about RNN’s, and setting up virtualenvwrapper/pyenv instead of plain virtualenv.

Did you get stuck anywhere?

At the end of the week I was stuck at understanding the details of Grad-CAM and how is it implemented in Keras. I wish I could’ve spent more time on learning actual Keras and doing some CNN visualisations.

 

Thanks for reading again!

Tomas Baltrunas

View Blog Post

Weekly check-in #1: 13/05 to 20/05

tomasb
Published: 05/20/2019

Hello everybody. Checking in for the first time under ELI5 (Scrapinghub), implementing Grad-CAM for neural networks.

What did you do this week?

Briefly looked into some auxiliary tools such as PyTest and mypy. Called with mentors for the first time. Discussed workflow and preparation details, such as when to call, create pull requests, and document code. Mentioned some things to do in the future - i.e. use Google Colab when need hardware to train a network, consider how to test image output, go through the technical Grad-CAM paper together. In the suborg's Slack, discussed how everyone should give updates.

What is coming up next?

Set up a recommended environment: PyCharm IDE, Jupyter notebooks for manual testing, one virtualenv for development and tox for testing on other environments. Enable recent libraries such as CUDA 9 on my local NVIDIA hardware. Install, use, and look over source code of existing Grad-CAM implementations. Look over and do the relevant parts of the recommended course at http://cs231n.stanford.edu/. Briefly learn tox (make sure the project's tox config works for me locally) and look into Sphinx very briefly.

Did you get stuck anywhere?

No specific issues yet, just overall a bit slow to start looking into the actual topic of the project- Grad-CAM.

Thank you for reading!

Tomas Baltrunas

View Blog Post