Hey. During week 8 I was working mostly on my own, since one of my mentors was on holidays. I did a few things, but on some days productivity or “throughput” just wasn’t good. Let me tell you about it.
Starting with text, I trained more text classifiers for manual tests, and opened a PR for the text codebase in ELI5 itself. In the codebase I did some refactoring and fixed broken image tests. I also added character-level explanations, and an option to remove pre-padding in text input.
Did I mention “character-level”? Indeed I managed to resolve last blog’s issue of low accuracy for a character-based network. Solution? Increase batch size from 30 to 500. http://theorangeduck.com/page/neural-network-not-working and http://karpathy.github.io/2019/04/25/recipe/ gave some ideas. (This fix was quite random and I got lucky, so I still have the problem of knowing how to train neural nets!)
I also trained a network with multiple LSTM layers. Training on CPU, first time things didn’t go past epoch 1. My sequences were way too long, so the solution was to reduce their length. In my case I found that LSTM’s realistically only work for about 100 tokens.
An interesting issue regarding refactoring the text codebase was to do with how to handle duplicate function arguments. Specifically we have a dispatcher function and multiple concrete functions. These concrete functions have unique arguments, but also share some. See https://github.com/TeamHG-Memex/eli5/blob/453b0da382db2507972cf31bb25e68dae5674b57/eli5/keras/explain_prediction.py for an example. I ended up not using any **kwargs magic and typed out all the arguments in the dispatcher and concrete functions myself (readable, but need to watch when making changes).
I followed through the PyTorch 60 minute blitz tutorial (mostly just copy pasting code and looking up docs for some objects). I must admit that I didn’t understand autograd, and I will need to go through it more carefully for gradient calculations.
I went on to load a pretrained model from torchvision and the imagenet dataset (since the dataset is 150 GB, I loaded just a few local samples). I will use this set up when creating Grad-CAM code.
Workflow wise, I discovered that using the PYTHONPATH variable, for example running `PYTHONPATH=$PWD/SOMEPATH jupyter notebook` is a convenient way to make sure Python can find your local modules. This is much easier than relocating your scripts or modifying sys.path.
Work on first image PR has stopped. Improvements to it are now being done through the second text PR.
So what I could’ve done more productivity-wise? I wish I could’ve progressed the text PR more. It is in a very early WIP stage still, with no tests, docs, and sprinkled with TODO’s/FIXME’s. Regarding PyTorch I wish I could’ve coded some experimental Grad-CAM code. Overall it is hard to not get lazy when working without “checks”, like leaving a comment about what you are working on or syncing up in more detail during the week.
For next week we have a second evaluation. I heard that I will have to submit PR’s, so it would be good to do two things: open a PyTorch ‘WIP’ PR once I get some Grad-CAM code, and remove the ‘WIP’ label from the text PR.
That’s it for the week. Hope the next one is productive!
Tomas Baltrunas