This week I worked as a package mechanic, developing two mechanisms for our package wsynphot. 😜 Firstly I migrated the mechanism used by our package to access filter data from a HDF storage to the cache on disk. Then I started setting up an update mechanism for the cached filter data to keep it up-to-date. Let me share how!
What did I do this week?
- I changed the data access methods in base module of our package to use filter data cache (which is handled by the module I created last week), instead of using HDF data file. This seemingly simple task, was successfully completed after doing a couple of unanticipated tasks:
- Fixed a problem in base module to access a calibration file. I dug rabbit holes to find the cause of this annoying problem - thanks to the search history option of git and the discussions I had with my mentor, that I finally solved it.
- Fixed the all-of-sudden failing IO tests due to a recent update in filter data at SVO.
- Integrated the functions from cache_filters module into the base module and made sure it works fine.
- Removed the older code that was meant for handling HDF storage of filter data, along with some other necessary cleanup in the package.
- Re-documented quickstart notebook due to these changes.
- Next I started working on cache updating mechanism to ensure that filter data on user's disk is up-to-date with that at SVO (our data source).
- For this I came up with a logic to compare filter index in cache with that at SVO online and then calculate the filters to add/remove in cache by using set difference operations. I felt great about it realizing how set theory helps in solving practical problems! 😌
- I created a function for implementing the same and solved several problems I encountered in the process, like data type conflicts (byte vs unicode), etc.
- I tested the function against a recent update at SVO, but I noticed discrepancies in the result which I traced back to find that they have updated their web interface but not their API which we're using. So I mailed their team for informing them about it & they instantly fixed it.
What is coming up next?
Now I need to finish up my work on developing this update cache mechanism. Also we have planned to mend some documentation of our package to make it accessible for more users.
Did I get stuck anywhere?Not as such, all the seemingly difficult problems that I encountered, eventually got solved.
What was something new I learned?
🗃️ Accessing package data: The data we store in our package doesn't get included while building it unless we specify it in package_data which is often configurable from setup.py file. This needs us to specify path of files we want to include in built package by using file globbing patterns. So I also learned that file globbing doesn't work recursively if we use /*, so we need to make sure to specify the paths of all depths we want to be included.
🔎 Tracing how a piece of code changed throughout the history of a project: Thanks to git that it provides numerous options with log command, using which you can find all the commits that affected a string (let it be the function name, or any other uniquely-identifying part of the code you want to search). It came into my use when I needed to know why the function listed in a error message of our package, doesn't exist in the codebase.
📝 How to edit and install a package simultaneously: Earlier I used to edit code files of the installed package in /../site-packages/ (which is even referred as bad practice) but I couldn't track the changes I made because it is not in a git repo. So I looked for a better approach on internet and discussed with my mentor to find that I can use setup.py develop instead of installing the package. This creates a
.egg-link in the site-packages back to the project source code directory so that you can see the changes directly without having to reinstall every time you made an edit in package.
Thank you for reading. Stay tuned to learn more things with me!