Week-7: Coding the Caching

Published: 07/15/2019

Hello folks,

This week I finished writing a module for caching filter data that our package wsynphot needs. So I wrote quite a lot of code which obviously required tests, thereby deepened my understanding of unit-testing. 🤓

What did I do this week?

  1.  I mainly worked on creating cache_filters module that handles cached filter data. I did various tasks under this:
    • Decorated download data function with progress bar using tqdm
    • Created functions to load filter data from cached VOTables to dataframes
    • Documented all function in a notebook
    • Created tests for entire module - used various new things like pytest fixtures, pandas testing framework and figured out how to make sure tests/data is available in built package
    • Improvised error reporting mechanism of the cache loader functions & created test for fail cases 
  2. I also setup RTD redirects to docs on Github pages (same last week's work) on a repository of our sister project TARDIS. And amazingly this time, I figured out how to create exact redirects to Github pages from RTD index page - now I know 2 methods of creating redirects: implicitly & explicitly.


What is coming up next?

Now I need to integrate these cache_filters module in our base module by dropping the functions using HDF storage (old way of filter data access). Then I'll create an updating mechanism for updating the filter data cache (as our data source SVO FPS keeps on updating).


Did I get stuck anywhere?

This week made me realize creating tests for I/O functions, is a really tricky task. I was confused on how to write unit test for download function that iteratively fetched over 4500 files. I extensively researched for such situations and found only solution was code refactoring for the sake of making function testable - which also have diverse views on internet. My mentor resolved this dilemma by telling me that I can skip lines from tests if they are either trivial or tested somewhere else, so I created test only for the functions being called by that massive download function.


What was something new I learned?

📇 Writing tests in Pytest that access data: I learned several things while creating unit tests for my module, like:

  • By using pytest fixture objects, we can setup & teardown data resources easily in our tests.
  • To make data stored in tests/data/ available to tests, we also need to make sure that data files are listed in package data of setup file - for this astropy setup helpers even provide a function get_package_data() to define the data we need to access from our built package.
  • For failing cases, we can create tests that check whether an expected exception is raised by using pytest.raises.

✏️ Logger instead of print statements: In a package we often need to show user some message, may be an error or information about the action they are performing by using our package. For these cases, print is an obvious choice but we have a much better option: Logger (object from python's logging module). Logger is preferred over print because logs are highly configurable - you can save them to files, can locate a logging call by line number they display, can hide/show them, etc.


Thank you for reading. Stay tuned to know about my upcoming experiences!