Hi! Welcome to my second “blog post” in which I will talk about my work, problems, and solutions for week 4. Let’s jump into it...
As expected this week was a split between working on my existing Pull Request (Grad-CAM applied to images), and exploring how to apply Grad-CAM to text.
Starting with the Pull Request work, I continued resolving various mentors’ comments on it.
At one point I wanted to make a simple test case that checks whether a warning is raised when certain dependencies, when making a certain function call, are missing. My problem was that I spent way too much time on that. Trying to mock dependencies or do tricks with sys.modules is hard. Since the case was quite specialised and rare, the solution was to simply hack up a manual test. Lesson - use your time economically!
Another problem that I often met was to do with a Jupyter Notebook tutorial for the docs. Whenever I converted the notebook to .rst, I had to make some manual repetitive changes (changing path URI’s of image links). Thankfully in one of the GitHub comments a mentor pointed out that there is a script in the repo dedicated for such manual notebook conversions. Shout out to “sed” and “mv”!
Lastly on the PR, when I was making changes to the Grad-CAM gradient and mean calculator, I had to deal with the concept of “dataflow programming” in Tensorflow (backend of Keras). It was a bit tricky getting used to this, as I sometimes forgot that Tensorflow operations don’t actually “happen” until you call a function to “evaluate” the instructions. After some attempts I got the code to work.
Onto the exciting topic of the week - text! I managed to train a bunch of simple text models in a Jupyter Notebook, and then applied the same Grad-CAM steps to the text models as I used for images. In many cases the results were indeed disappointing, but for one case the explanation does work!
The first issue with text was padding. In some cases it’s useful to have each text sequence be the same length, padding the text with some “padding token” if it is too short. The problem is what to do with this padding once we get a Grad-CAM heatmap (showing the importance of each token for some prediction). My solution was to simply cut the heatmap at the point where padding starts. This has produced an actual working result, though with edge cases. Another issue yet to be resolved was applying Grad-CAM onto text without padding. My models, trained with padding, but given input without padding, were way off. Padding was a big discussion point with my mentor this week.
There are many different model architectures for text, which is another issue. As advised, to not make things more difficult for myself I decided to leave RNN’s, LSTM’s, etc. for later on, sticking to fully connected and Convolutional networks for now. Of course, there are issues, yet to be solved, with such architectures as well. Using a traditional model with Dense layers and a Global Average Pooling layer seemed to give no results at all.
So you can see that there were many technical issues regarding text. Things went slowly. Over the weekend I looked at a Deep Learning book to have some background for text, in topics such as tokenization, vectorization, embedding, etc. To train more models faster I also wanted to install Tensorflow for GPU, but my GPU (compute capability 3.0) is too old to match the current Tensorflow requirements (at least compute capability 3.5 for easiest installation). Google Collab and Kaggle kernels here I come!
I hope this was not too long and you were able to skim through it. Next week hopefully we’ll see an “end-to-end” example of a prediction about text being explained. I might even finish with the PR (I know that I have been saying this for the last two weeks...). Also, enjoy writing for Evaluation 1. Let us pass!
Tomas Baltrunas