Hi! At the start of week 7 I have learned that one of my mentors will be going on holiday during the next two weeks. This means that during that time I will mostly work by myself. Excited to see what’s going to happen!
What did you do this week?
Since this was my mentor’s last week before holidays, we had sync up calls on Wednesday and also on Friday. We talked about how to explain RNN’s and character-based models. An interesting addition to the text codebase this week was resizing or “resampling” 1D arrays. I ended up using scipy.signal.
For the image PR, another mentor joined and left some comments for improvement, mostly about clarifying docs and outdated information (could this be automated or tested?).
Re: Workflow, I did a couple of things on Sunday, such as creating a symlink between my long GSOC path and simply ~/gsoc in my home directory, and removing a commit in the middle of other commits with git rebase.
What is coming up next?
I couldn't get down to doing some PyTorch this week, so I should definitely start with it next week.
It would be good to clean up the text branch (fix regressions, add tests, docs, tutorial) and make a second PR. The code should explain at least some models reasonably well, so I’ll need to train more RNN’s (with more than one LSTM layer) and a character-based model (see below for issues) for testing.
The image PR has more or less settled down. I will implement any suggestions, and once the PR gets merged I might add commits for a new release.
Did you get stuck anywhere?
The first stumbling block this week was the theory behind RNN’s. I read some articles about them, but I still don’t have an exact idea of how they work. And I didn’t even start with LSTM’s!
But the main obstacle this week was training a character-based network. Firstly, the text datasets in Keras (for example IMDB) are made for word tokens, not characters. I had to get the original IMDB texts and build a tokenizer. This took some time but it worked.
Training the network on CPU was so slow that I couldn’t get past one epoch! My mentor suggested picking a reasonable max length for the sequences - say the 99th or 95th percentile. He also suggested to train on GPU. Since I'm broke student that can’t afford a GPU with CUDA 3.5, I used Kaggle kernels.
The training worked but the network had an accuracy close to 50%. This is an issue yet to be resolved using some neural network troubleshooting techniques.
That’s the update of the week. Again I’m excited to work more independently in the next two weeks. Hopefully it won’t end up with no progress and all tests failing, and we’ll get some work done!
Tomas Baltrunas