Weekly blog #3 (week 6): 01/07 to 07/07

Published: 07/07/2019

Hello there. It’s time for a weekly blog. One thing to note right away is that half of GSoC is already gone (week 6 is over)! I really do hope to put in some solid work before everything finishes.


Let me start by saying what I did this week. Firstly, once again the PR got closer to being finished. Just to mention a few of the interesting changes: with the help of my mentor I added a check, though not perfect, for whether a network outputs ‘scores’ (unconstrained values) or ‘probabilities’ (values between 0 and 1). I also had to make one argument optional for a common object in the library, without breaking too many things and without too many corrections (mypy helped massively). An interesting addition to the Grad-CAM code was checking whether differentiation failed (checking whether gradients had any None values) and making an automated test for that.


Outside the PR, the mentors made a new ELI5 release, and I merged those changes into my fork.


Regarding text, I managed to add some code to ELI5 to get a working end-to-end text explanation example (trust me, you don’t want to see the code under the hood - a bunch of hardcoding and if statements!). Improvements were made.


Through the week I came across a few issues. 


For example, at the start of Tuesday, just as I was about to do some productive work, I realised that me or Jupyter Notebook had not saved some of the work that I had done on Monday. Big shout out to IPython %history magic that let me recover the commands that I ran. I’ll CTRL+S all the time from now on!


I felt that a big issue this week was a lack of productivity, especially relating to ELI5 text code. Sometimes I found myself making random changes to code, without much planning. I was overwhelmed by how much there is to refactor and implement, and I still have to deal with the problem of reusing existing code from image explanations. A couple of things had helped:

  1. Again make small changes and test. Don’t make changes “into the future” that you can’t test right now.

  2. Create a simple list of TODO items in a text editor, like I did for image code before. This is great for seeing the big picture, taking tasks one-by-one instead of everything at once, and marking tasks as done.

Writing library code - code that interacts with existing code and code that may expose an API to users - is much harder than simple experimental snippets in a Jupyter Notebook. Organisation helps.


On Sunday I got down to adding virtualenvwrapper to my workflow. Goodbye “source big-relative-or-absolute-path-to-my-virtualenv/bin/activate” and hello “workon eli5gradcam”! I also tried to train a ConvNet on a multiple classification problem (reuters newstopics), but the model kept on getting an appalling 30% accuracy. Perhaps I need a minor modification to the hyperparameters, or maybe ConvNet’s are not so good for multiple classification and text tasks?


I believe that next week will be much the same as week 6, topic wise: small PR changes, library code for text, and experimentation to make the code cover more possible models (RNN’s will come - someday!).


Thank you for reading once again, and see you next time!

Tomas Baltrunas

View Blog Post

Weekly check-in #5 (week 5): 24/06 to 30/06

Published: 07/01/2019

Hi all. I am running out of ideas for opening sentences, so let me just begin!

What did you do this week?

This week was evaluation, though that was very short and work continued as usual. On Monday and Wednesday I synced up with one of my mentors, discussing how we will incorporate text into existing ELI5 code, our next steps. 


However, I wanted to get the PR out of the way, so most of what I did this week was resolving comments and opening new GitHub issues for larger TODO’s and FIXME’s. Finally all the checks went green!


Only on Sunday I got to do a bit of work on text, reusing existing ELI5 code to highlight text importance.

What is coming up next?

Now that I have a roughly working “end-to-end” example of text explanations in a Jupyter Notebook, I need to incorporate it into the ELI5 codebase (on a new branch). The example is indeed “roughly working”, so I will need to experiment and debug, and add explanations for more models (RNN’s come to mind).


Thinking about the next stage - what to do by the second evaluation - I have two top priority tasks in mind. First, explain text. Second, add PyTorch support alongside Keras. Roughly the same goals as in the proposed schedule, with some differences.

Did you get stuck anywhere?

I found myself working on the PR for too long, and I think it’s harder for me to switch between tasks than to work on a single task. From next week a good resolution would be to focus on text and spend less time on the PR.


Remembering all the things I did through the week and putting them on a blog in one day also tended to be hard. My solution was to write every couple of days. This did not happen, but I did manage to write the blog a day earlier (Sunday instead of Monday)!


When working on the “end-to-end” text explanation example, I found the ELI5 objects used for HTML formatting a bit mystic. Fortunately there was a tutorial showing highlighted text, and stepping through ELI5 code with pdb and examining the values helped.


That’s the check-in for the week. Continue on the work!

Tomas Baltrunas

View Blog Post

Weekly blog #2 (week 4): 17/06 to 23/06

Published: 06/25/2019

Hi! Welcome to my second “blog post” in which I will talk about my work, problems, and solutions for week 4. Let’s jump into it...


As expected this week was a split between working on my existing Pull Request (Grad-CAM applied to images), and exploring how to apply Grad-CAM to text.


Starting with the Pull Request work, I continued resolving various mentors’ comments on it. 


At one point I wanted to make a simple test case that checks whether a warning is raised when certain dependencies, when making a certain function call, are missing. My problem was that I spent way too much time on that. Trying to mock dependencies or do tricks with sys.modules is hard. Since the case was quite specialised and rare, the solution was to simply hack up a manual test. Lesson - use your time economically!


Another problem that I often met was to do with a Jupyter Notebook tutorial for the docs. Whenever I converted the notebook to .rst, I had to make some manual repetitive changes (changing path URI’s of image links). Thankfully in one of the GitHub comments a mentor pointed out that there is a script in the repo dedicated for such manual notebook conversions. Shout out to “sed” and “mv”!


Lastly on the PR, when I was making changes to the Grad-CAM gradient and mean calculator, I had to deal with the concept of “dataflow programming” in Tensorflow (backend of Keras). It was a bit tricky getting used to this, as I sometimes forgot that Tensorflow operations don’t actually “happen” until you call a function to “evaluate” the instructions. After some attempts I got the code to work.


Onto the exciting topic of the week - text! I managed to train a bunch of simple text models in a Jupyter Notebook, and then applied the same Grad-CAM steps to the text models as I used for images. In many cases the results were indeed disappointing, but for one case the explanation does work!


The first issue with text was padding. In some cases it’s useful to have each text sequence be the same length, padding the text with some “padding token” if it is too short. The problem is what to do with this padding once we get a Grad-CAM heatmap (showing the importance of each token for some prediction). My solution was to simply cut the heatmap at the point where padding starts. This has produced an actual working result, though with edge cases. Another issue yet to be resolved was applying Grad-CAM onto text without padding. My models, trained with padding, but given input without padding, were way off. Padding was a big discussion point with my mentor this week.


There are many different model architectures for text, which is another issue. As advised, to not make things more difficult for myself I decided to leave RNN’s, LSTM’s, etc. for later on, sticking to fully connected and Convolutional networks for now. Of course, there are issues, yet to be solved, with such architectures as well. Using a traditional model with Dense layers and a Global Average Pooling layer seemed to give no results at all.


So you can see that there were many technical issues regarding text. Things went slowly. Over the weekend I looked at a Deep Learning book to have some background for text, in topics such as tokenization, vectorization, embedding, etc. To train more models faster I also wanted to install Tensorflow for GPU, but my GPU (compute capability 3.0) is too old to match the current Tensorflow requirements (at least compute capability 3.5 for easiest installation). Google Collab and Kaggle kernels here I come!


I hope this was not too long and you were able to skim through it. Next week hopefully we’ll see an “end-to-end” example of a prediction about text being explained. I might even finish with the PR (I know that I have been saying this for the last two weeks...). Also, enjoy writing for Evaluation 1. Let us pass!


Tomas Baltrunas

View Blog Post

Weekly check-in #4 (week 3): 10/06 to 16/06

Published: 06/17/2019

Hi! It’s almost a month into GSoC now.

What did you do this week?

In week 3 I submitted a Work-in Progress Pull Request (https://github.com/TeamHG-Memex/eli5/pull/315) for explaining Keras image classifiers. The rest of the week was making changes to this PR. One of the mentors added plenty of comments to the it so that kept me busy. I have changed many things: the tutorial, docstrings, exception handling, the API (function signatures). It’s maintenance time!


What is coming up next?

Finishing up the PR soon is definitely a priority, so that I can work on other things. There are a few issues left to resolve such as optimisation and managing optional dependencies for images. I need a few more tests for coverage to look good as well!


After that, I hope to get started with text and machine learning. The mentors shared some resources related to that, such as this book and tutorials from Tensorflow and Keras docs. First I will need to get familiar with the area, and only then start applying Grad-CAM.

Did you get stuck anywhere?

I was a bit stuck getting my docs build to work. In our Sphinx automatic documentation builder, when mocking external libraries I had to declare submodules, not just the top level modules, that I have used. The syntax for docstrings was weird too.


When making the tutorial I spent way too much time trying to find a unique picture from ImageNet. The site was often slow and after giving up I returned to the good old ‘cat_dog.jpeg’.


I wanted to work on new features this week, but I was always on the PR, so that was a blocker!


We have a week left before the first evaluation. Thanks for reading and keep up the work!

Tomas Baltrunas

View Blog Post

Weekly blog #1 (week 2): 03/06 to 09/06

Published: 06/10/2019

As per the PSF calendar, for this week I will try to write a blog post instead of the usual check-in post. I will be answering three questions: what I am working on, what I struggled with, and what solutions I have come to.


First, a recap of week 1.


I started working with Grad-CAM for Keras and images. Just to explain that, say we have neural network that takes in an input such as an image, and gives an output, for example a category that tells you what is in the image. By using Grad-CAM, we can highlight the pixels in the image that helped the network decide on the category that it picked. We can check where the network “looks”.


Right away I struggled with implementing such “explanations”. Like any respectable student does I found a GitHub repo that contained all the work I needed to do and copy pasted it real fast. In the end this worked well, but my approach was not good. I started out by adding code function-by-function, making “optimisations” as I saw fit.  Unfortunately I could not check if I have made any errors, and ended up with some exceptions that I could not resolve.


The solution to this was testing, then making small changes. It’s hard to take something that does not work and make changes to it. It’s much easier to take what works and change it, then check that nothing broke. I thank the mentors for advising me this.


Going forward to week 2, I added automated tests for what I have done in week 1, and as per a sync up call with a mentor made some optimisations to the Grad-CAM implementation.


There were a couple of issues I ran into. Firstly, testing image output is a problem in itself. There were only a few comments on this online, but I talked to my mentors and we agreed that doing a rough check (checking average values in a region) would be good (pixel-by-pixel checks are too fragile). This led to some “integration” tests.


Next, I found it hard to come up with a few “unit tests”. I clarified the API with the mentors and changed some function signatures, and that helped.


I think these were the main “issues” of week 2. It went by fast and I am much happier with the code now. Looking forward to learning about RNN’s and adapting Grad-CAM next week! But first I have to write some docs :(


See you next week,

Tomas Baltrunas

View Blog Post