This is a detailed blog post regarding my first week of code.
My work involves adding a label feature to tag the datasets such that when I train multiple models on the same datasets i'm able to distinguish between them which would be very helpful while training multiple models simultaneously.
As a stepping stone, my mentor asked me to implement it for a specific type of source, and hence JSONSource was chosen. The codebase has grown a lot since i wrote my proposal so most of the time was spent in understanding the workflow so that i can analyse where the changes have to be made. After trying to understand the code by giving so many print statements in between lines, finally I got an idea of how i'm supposed to achieve my goal.
Now that I know what changes have to be done, the challenge was in not only making those changes but also testing it. While making the changes, there have been instances where unexpectedly we see some whacky behaviour. This lead to code changes in places we didn't expect in the beginning. Thanks to this, now we have tightened up few loose ends. After making the necessary changes, my mentor and I worked on testing it.
Initial testing was a failure and there were so many unexpected errors. Since the changes were done in a function that dumps the data into a file, all the tests that were directly or indirectly using this were failing. We figured out everything expect for JSONDecodeError that was occuring when we are trying to do json.load() in dump_fd() in JSONSource.
We somehow figured why this error occured and fixed it in the future week. See the next blog post for updates on this!