Week-9 (Evaluation Period-2): Finally Cached'em all

jaladh-singhal
Published: 07/29/2019

Hello folks,

Evaluation week knocked again to remind us that it has been 2 months since we started! This week I finally finished up developing & testing the update cache mechanism to keep our cached filter data up-to-date. The challenges I encountered in writing IO tests for it, unlocked an unknown realm of unit-testing for me. 😮

What did I do this week?

  1. I finished up my work on cache_filters module that handles cached filter data, by completing the implementation of update cache mechanism:
    • Created configuration functions that reminds user to update their cache if >1 month since last update, by reading/writing cache updation date in configuration file.
    • Improved design of codebase by decoupling the concerning modules (this was because I got stuck in circular import problem as soon as I added my code to the codebase)
    • Documented the use of update function
  2. I began to write tests for the update function but again this was like download function which downloads a lot of files (~1K) from SVO (our data source) so I could not test it directly but only its helper function that it calls. But I was not satisfied with this approach as my function was not getting entirely covered so I searched extensively to find is there a way to change the code of my function only when testing it. And fortunately I found about test mocking & monkeypatching. 🧐
    • Hence I drastically reduced the huge data dependency of my update & download functions by creating a mock function to download only a very less no. of files (<10). And then I injected it each time these functions are tested by using pytest monkeypatch fixture.
    • Also I realized that for preventing the manipulation of data in unexpected ways, I need separate directories while testing my I/O functions - a temporary directory for the functions that write data to disk, and tests/data directory for the functions that read data from disk. So I used pytest tempdir_factory fixture for the tests of my update & download functions which write the data.

 

What is coming up next?

My mentor told me to make nbsphinx execute the notebook each time it build the docs so as to check that code in our package works fine. So I need to configure our docs building pipeline to consume cached filter data when executing the notebooks which I can possibly do by storing the cache at Azure as an artifact. 

 

Did I get stuck anywhere?

Yes I got stuck in several problems this week like circular imports, test a function with huge dependencies, etc. But by patiently searching solutions & trying them, I resolved all of those problems, ultimately improvising our package.

 

What about Evaluations?
I can't believe it has been almost 2 months in this GSoC journey! When I look back, I see lot of work has been done and a lot more is waiting to be done. And obviously I passed my 2nd evaluation as I'm writing this blog post. My mentor is on vacations so unlike 1st evaluation, he couldn't write back a feedback to me which he said to provide later.
😊

 

What was something new I learned?

📑 Purpose of Configuration file in a package: It gives users control of various parameters used in the package without delving into the source code to make the required changes. It is handled by a configuration system in the package e.g. a config.py file to use & treat the data defined by user.

🔁 Circular dependencies: When you import some entity in a module from another module which inturn import some other entity from the former module - you get stuck in a circular import leading to ImportErrors & AttributeErrros! It is due to a bad design such that modules of your package become tightly coupled. It can be resolved by carefully tracing the import logic of your package to root out the problematic entity (a variable, function or object). Then it can be usually fixed by placing that entity in right module (or even in a new module) where it should logically belong, and then import it.

📥 Unit testing a download function: I was confused about what to assert in the test of download function so I searched & read about it. It may appear that we can test whether connection has made to data source and data file is fetched without errors or not. But it is not required as exception handling mechanism in our function will do that. The unit testing means we assert what function should has done instead of how it has done, so we should assert that downloaded file is right directory and its contents are as expected (by reading & comparing it with expected data).

🐒 Test mocking & monkeypatching: Mocking means creating such objects that imitates the behavior of real objects, and dynamically changing a piece of software (module, object or function) at runtime is referred as monkeypatching. So while testing, we can inject (or substitute) the complex dependencies with the mock objects so that our test only focuses on code of the unit we're testing, instead on its dependencies. This also helps in making tests run faster as it minimizes the real amount & interaction with dependency. This can be implemented by using pytest monkeypatch fixture as explained wonderfully in this blog.

 


Thank you for reading. Stay tuned to learn about more interesting things with me!

1000 characters left