Week-8: Remodeling the data access Mechanism

jaladh-singhal
Published: 07/21/2019

Hello folks,

This week I worked as a package mechanic, developing two mechanisms for our package wsynphot.ย ๐Ÿ˜œ Firstly I migrated the mechanism used by our package to access filter data from a HDF storage to the cache on disk. Then I started setting up an update mechanism for the cached filter data to keep it up-to-date.ย  Let me share how!

What did I do this week?

  1. I changed the data access methods in base module of our package to use filter data cache (which is handled by the module I created last week), instead of using HDF data file. This seemingly simple task, was successfully completed after doing a couple of unanticipated tasks:
    • Fixed a problem in base module to access a calibration file. I dug rabbit holes to find the cause of this annoying problem - thanks to the search history option of git and the discussions I had with my mentor, that I finally solved it.
    • Fixed the all-of-sudden failing IO tests due to a recent update in filter data at SVO.
    • Integrated the functions from cache_filters module into the base module and made sure it works fine.
    • Removed the older code that was meant for handling HDF storage of filter data, along with some other necessary cleanup in the package.
    • Re-documented quickstart notebook due to these changes.
  2. Next I started working on cache updating mechanism to ensure that filter data on user's disk is up-to-date with that at SVO (our data source).
    • For this I came up with a logic to compare filter index in cache with that at SVO online and then calculate the filters to add/remove in cache by using set difference operations. I felt great about it realizing how set theory helps in solving practical problems! ๐Ÿ˜Œ
    • I created a function for implementing the same and solved several problems I encountered in the process, like data type conflicts (byte vs unicode), etc.
    • I tested the function against a recent update at SVO, but I noticed discrepancies in the result which I traced back to find that they have updated their web interface but not their API which we're using. So I mailed their team for informing them about it & they instantly fixed it.

ย 

What is coming up next?

Now I need to finish up my work on developing this update cache mechanism. Also we have planned to mend some documentation of our package to make it accessible for more users.

ย 

Did I get stuck anywhere?

Not as such, all the seemingly difficult problems that I encountered, eventually got solved.

ย 

What was something new I learned?

๐Ÿ—ƒ๏ธ Accessing package data: The data we store in our package doesn't get included while building it unless we specify it in package_data which is often configurable from setup.py file. This needs us to specify path of files we want to include in built package by using file globbing patterns. So I also learned that file globbing doesn't work recursively if we use /*, so we need to make sure to specify the paths of all depths we want to be included.

๐Ÿ”Ž Tracing how a piece of code changed throughout the history of a project: Thanks to git that it provides numerous options with log command, using which you can find all the commits that affected a string (let it be the function name, or any other uniquely-identifying part of the code you want to search). It came into my use when I needed to know why the function listed in a error message of our package, doesn't exist in the codebase.

๐Ÿ“ How to edit and install a package simultaneously: Earlier I used to edit code files of the installed package in /../site-packages/ (which is even referred as bad practice) but I couldn't track the changes I made because it is not in a git repo. So I looked for a better approach on internet and discussed with my mentor to find that I can use setup.py develop instead of installing the package. This creates a .egg-link in the site-packages back to the project source code directory so that you can see the changes directly without having to reinstall every time you made an edit in package.

ย 


Thank you for reading. Stay tuned to learn more things with me!

DJDT

Versions

Time

Settings from gsoc.settings

Headers

Request

SQL queries from 1 connection

Static files (2312 found, 3 used)

Templates (11 rendered)

Cache calls from 1 backend

Signals

Log messages