jaladh-singhal's Blog

Week-12: Developing Analytical Web Apps

jaladh-singhal
Published: 08/19/2019

Hello folks,

View Blog Post

Week-11: From Interactive Notebooks to Web Apps

jaladh-singhal
Published: 08/11/2019

Hello folks,

This week we finally came to the part for which I was eagerly waiting, i.e. developing analytical web interfaces (web app stuff) so that users can easily interact with the data to get the visualizations & results they need. 🙃 I played around with visualization stuff and dived into researching the technologies present around Jupyter to make web apps from interactive notebooks. This way the developed interfaces can be accessed both locally as well as on web. 😍

What did I do this week?

  1. I apparently fixed the problem that I shared in last week's blog (PR builds failing since I made pipeline to access filter data from artifacts for executing notebooks). I experimented & found that when I manually build the PR commits (being a member of my Org), the build passes which was failing when getting triggered as a PR build (from me as a contributor). So this made it clear that we can merge the PR in our codebase. And PR build failure was due to a problem of authorization when calling REST API for which, I also opened an issue (called as feedback there) in Azure DevOps Developer Community. I followed up with their team, need to try some things but in future since it's not so prior when more important things are ahead!
  2. Since all major work to make wsynphot stable to use is done, I finally got chance to do the work I wanted to do: creating analytical web interfaces for our packages (major part of my proposed project)! Our main aim is to build a web app for calculating photometry of stars, which lies at the intersection of both of our packages: starkit  & wsyphot. So I started using starkit to plot star's spectrum from grids (HDF5 files that contain relevant data of star) and played around with it.
    • I created a table in starkit docs, listing all test grids created by mentors that are available at Zenodo for open access. I also needed to find out the bound values of each grid's characteristics in the table, which improved my understanding how starkit works with grids.
    • Then I followed up documentation of bqplot & ipywidgets to create an interactive plot of the spectrum by using sliders for changing the values of grid characteristics.
  3. Now before advancing in plotting, my mentor told me to try converting this simple interactive plot (a prototypical interface) into a web app. So I researched about possible ways to convert an interactive Jupyter notebook into a web app, and found out about several interesting things to implement. 😮

 

What is coming up next?

Now I'll start implementing the several ways I found to create web app from Jupyter notebook. Then we will compare & decide which one should we use, before I continue developing more interactive interfaces.

 

Did I get stuck anywhere?

Not any such, this week was more about exploration & experimentation.

 

What was something new I learned?

📊 Bqplot: It is a 2-D plotting for Jupyter, based on Grammar of Graphics. Thus it is a perfect choice for integrating visualizations with other Jupyter interactive widgets to create integrated GUIs - which is what we want to develop at StarKit.

🚀 Variety of services Project Jupyter provides: From interactive UI controls to a tool for running notebooks in the cloud, it is really powerful & amazing!

  • IPyWidgets are used to interactively visualize and control changes in the data to instantly see how it affects the results.
  • Binder allows us to create custom computing environments on cloud that can be used to run & share notebooks with remote users on web. Popularly used is mybinder.org which is a pre-existing deployment of BinderHub that is built on JupyterHub.
  • JupyterLab is the next generation of notebooks interface providing an IDE for data science workflow!

 


Thank you for reading. Stay tuned to learn about my upcoming experiences!

View Blog Post

Week-10: Executing notebooks from the Pipeline

jaladh-singhal
Published: 08/05/2019

Hello folks,

This week was pretty tough. A seemingly simple task of setting up docs CD pipeline to execute notebooks ate up entire week, thanks to the complexities of DevOps! 🙄

What did I do this week?

  1. To execute our notebooks from docs CD pipeline, we need to make cached filter data available at the VM as a pre-build step. So I decided to store it at Azure as artifact (Universal Package) so that each time pipeline runs, it can instantly download artifact into VM and then also update it before using. Since it's updating the artifact, I decided to also publish it back to artifact feed thereat so that next time we get a more up-to-date data.
    • For 1st build of such a pipeline, I needed to make sure that artifact is already present in feed - which means publish the artifact without pipeline but from Azure CLI, locally. The authentication in Azure CLI was really cumbersome but I figured out how to use a PAT for it & then published the filter data as artifact.
    • Then I wrote script steps in pipeline to download artifact & bring it in right directory. The challenge here was to make pipeline download the latest versioned artifact from feed. After some mind-boggling research, I found how to achieve it with Azure REST API, but problem was again authentication while calling the API. On trying to solve it next day with a clear head, I found the solution that was right in front of my eyes in an already opened SO thread which I was overlooking!
    • I wrote script to conditionally publish the filter data as a newer versioned artifact if it got updated. Here I needed to write a python command that calls update function & returns update status back to bash script. But due to logger enabled in function, the returned value was entirely messed up - I fixed it up by disabling logger.
    • I also improved the versioning of artifacts by using date as version but there were conflicts with SemVer accepted by Azure, which again took time to manipulate date into acceptable version.
  2. Next I needed to make sure that executed notebooks give right outputs.
    • The matplotlib plots didn't appear in rendered noteboook. After some searching & digging, I found it was because we were interactively plotting graphs using %pylab notebook. By using %matplotlib inline magic which works non-interactively I made the plots appear.
    • I also cleaned some unnecessary data from quickstart notebook & made documentation more clear.

 

What is coming up next?

There's still a problem with pipeline, it is failing for PR builds although it works fine for my fork. I will try to fix that and then we will possibly move to starkit, where we can integrate it with wsynphot (on which I am currently working) to produce an interface for calculating photometery.

 

Did I get stuck anywhere?

Yes, it was these unexpected problems that took me finishing off this task of making pipeline execute the notebooks, an entire week! But I eventually solved all of them except that PR build failing problem.

 

What was something new I learned?

⚙️ This week made me learn many new things about Azure DevOps, like:

  • Azure CLI & how to use & authorize it to manage Azure resources from another (local) system
  • Azure REST API i.e. a really powerful API which lets you create/retrieve/update/delete the Azure resources
  • System.AccessToken which is a special predefined variable that used as OAuth token to access the REST API
  • Unlike powershell task which runs Windows PowerShell, we can use pwsh task on LINUX VMs since it runs PowerShell Core
  • Conditionally run a step in Azure by using conditions option.

🔡 While passing variable from a child process (python script) to parent shell (bash), make sure that you only write the value which you want to be passed on stdout. This means keep a check that there are no such function calls with print or logging statements in your script other than value you want to pass by printing it.

🧐 Openness for the strange options while researching: When we search for the solution of a problem on internet, lot of new & weird information comes before us which we just skim enough to decide that it is not for our case. But even then if we try to understand it, we may get how to make it work for our case by experimenting. Same happened when I was looking for how to authorize my build for Azure API call. On a SO thread I found a powershell script for it but I didn't bother to understand it thinking that powershell script can't be of any use to a LINUX VM. But when I eventually found the solution (SystemAcessToken & pwsh for LINUX), I was like: After all this time answer was right in front of my eyes and I was searching it here & there, by not caring to give it some minutes to understand! 

 


Thank you for reading. Stay tuned to learn about more interesting things with me!

View Blog Post

Week-9 (Evaluation Period-2): Finally Cached'em all

jaladh-singhal
Published: 07/29/2019

Hello folks,

Evaluation week knocked again to remind us that it has been 2 months since we started! This week I finally finished up developing & testing the update cache mechanism to keep our cached filter data up-to-date. The challenges I encountered in writing IO tests for it, unlocked an unknown realm of unit-testing for me. 😮

What did I do this week?

  1. I finished up my work on cache_filters module that handles cached filter data, by completing the implementation of update cache mechanism:
    • Created configuration functions that reminds user to update their cache if >1 month since last update, by reading/writing cache updation date in configuration file.
    • Improved design of codebase by decoupling the concerning modules (this was because I got stuck in circular import problem as soon as I added my code to the codebase)
    • Documented the use of update function
  2. I began to write tests for the update function but again this was like download function which downloads a lot of files (~1K) from SVO (our data source) so I could not test it directly but only its helper function that it calls. But I was not satisfied with this approach as my function was not getting entirely covered so I searched extensively to find is there a way to change the code of my function only when testing it. And fortunately I found about test mocking & monkeypatching. 🧐
    • Hence I drastically reduced the huge data dependency of my update & download functions by creating a mock function to download only a very less no. of files (<10). And then I injected it each time these functions are tested by using pytest monkeypatch fixture.
    • Also I realized that for preventing the manipulation of data in unexpected ways, I need separate directories while testing my I/O functions - a temporary directory for the functions that write data to disk, and tests/data directory for the functions that read data from disk. So I used pytest tempdir_factory fixture for the tests of my update & download functions which write the data.

 

What is coming up next?

My mentor told me to make nbsphinx execute the notebook each time it build the docs so as to check that code in our package works fine. So I need to configure our docs building pipeline to consume cached filter data when executing the notebooks which I can possibly do by storing the cache at Azure as an artifact. 

 

Did I get stuck anywhere?

Yes I got stuck in several problems this week like circular imports, test a function with huge dependencies, etc. But by patiently searching solutions & trying them, I resolved all of those problems, ultimately improvising our package.

 

What about Evaluations?
I can't believe it has been almost 2 months in this GSoC journey! When I look back, I see lot of work has been done and a lot more is waiting to be done. And obviously I passed my 2nd evaluation as I'm writing this blog post. My mentor is on vacations so unlike 1st evaluation, he couldn't write back a feedback to me which he said to provide later.
😊

 

What was something new I learned?

📑 Purpose of Configuration file in a package: It gives users control of various parameters used in the package without delving into the source code to make the required changes. It is handled by a configuration system in the package e.g. a config.py file to use & treat the data defined by user.

🔁 Circular dependencies: When you import some entity in a module from another module which inturn import some other entity from the former module - you get stuck in a circular import leading to ImportErrors & AttributeErrros! It is due to a bad design such that modules of your package become tightly coupled. It can be resolved by carefully tracing the import logic of your package to root out the problematic entity (a variable, function or object). Then it can be usually fixed by placing that entity in right module (or even in a new module) where it should logically belong, and then import it.

📥 Unit testing a download function: I was confused about what to assert in the test of download function so I searched & read about it. It may appear that we can test whether connection has made to data source and data file is fetched without errors or not. But it is not required as exception handling mechanism in our function will do that. The unit testing means we assert what function should has done instead of how it has done, so we should assert that downloaded file is right directory and its contents are as expected (by reading & comparing it with expected data).

🐒 Test mocking & monkeypatching: Mocking means creating such objects that imitates the behavior of real objects, and dynamically changing a piece of software (module, object or function) at runtime is referred as monkeypatching. So while testing, we can inject (or substitute) the complex dependencies with the mock objects so that our test only focuses on code of the unit we're testing, instead on its dependencies. This also helps in making tests run faster as it minimizes the real amount & interaction with dependency. This can be implemented by using pytest monkeypatch fixture as explained wonderfully in this blog.

 


Thank you for reading. Stay tuned to learn about more interesting things with me!

View Blog Post

Week-8: Remodeling the data access Mechanism

jaladh-singhal
Published: 07/21/2019

Hello folks,

This week I worked as a package mechanic, developing two mechanisms for our package wsynphot. 😜 Firstly I migrated the mechanism used by our package to access filter data from a HDF storage to the cache on disk. Then I started setting up an update mechanism for the cached filter data to keep it up-to-date.  Let me share how!

What did I do this week?

  1. I changed the data access methods in base module of our package to use filter data cache (which is handled by the module I created last week), instead of using HDF data file. This seemingly simple task, was successfully completed after doing a couple of unanticipated tasks:
    • Fixed a problem in base module to access a calibration file. I dug rabbit holes to find the cause of this annoying problem - thanks to the search history option of git and the discussions I had with my mentor, that I finally solved it.
    • Fixed the all-of-sudden failing IO tests due to a recent update in filter data at SVO.
    • Integrated the functions from cache_filters module into the base module and made sure it works fine.
    • Removed the older code that was meant for handling HDF storage of filter data, along with some other necessary cleanup in the package.
    • Re-documented quickstart notebook due to these changes.
  2. Next I started working on cache updating mechanism to ensure that filter data on user's disk is up-to-date with that at SVO (our data source).
    • For this I came up with a logic to compare filter index in cache with that at SVO online and then calculate the filters to add/remove in cache by using set difference operations. I felt great about it realizing how set theory helps in solving practical problems! 😌
    • I created a function for implementing the same and solved several problems I encountered in the process, like data type conflicts (byte vs unicode), etc.
    • I tested the function against a recent update at SVO, but I noticed discrepancies in the result which I traced back to find that they have updated their web interface but not their API which we're using. So I mailed their team for informing them about it & they instantly fixed it.

 

What is coming up next?

Now I need to finish up my work on developing this update cache mechanism. Also we have planned to mend some documentation of our package to make it accessible for more users.

 

Did I get stuck anywhere?

Not as such, all the seemingly difficult problems that I encountered, eventually got solved.

 

What was something new I learned?

🗃️ Accessing package data: The data we store in our package doesn't get included while building it unless we specify it in package_data which is often configurable from setup.py file. This needs us to specify path of files we want to include in built package by using file globbing patterns. So I also learned that file globbing doesn't work recursively if we use /*, so we need to make sure to specify the paths of all depths we want to be included.

🔎 Tracing how a piece of code changed throughout the history of a project: Thanks to git that it provides numerous options with log command, using which you can find all the commits that affected a string (let it be the function name, or any other uniquely-identifying part of the code you want to search). It came into my use when I needed to know why the function listed in a error message of our package, doesn't exist in the codebase.

📝 How to edit and install a package simultaneously: Earlier I used to edit code files of the installed package in /../site-packages/ (which is even referred as bad practice) but I couldn't track the changes I made because it is not in a git repo. So I looked for a better approach on internet and discussed with my mentor to find that I can use setup.py develop instead of installing the package. This creates a .egg-link in the site-packages back to the project source code directory so that you can see the changes directly without having to reinstall every time you made an edit in package.

 


Thank you for reading. Stay tuned to learn more things with me!

View Blog Post