Lessons learned: implementing `splot`

In my last blog post I explained the new API structure the splot team decided on. In my last two weeks at GSoC I have been working on the implementation of this API for the giddy and esda sub-packages. This functionality will be available in the release of splot which will likely be at the end of this GSoC.

This is an example how the new API can be used in future:

Imports

import esda
import matplotlib.pyplot as plt
import libpysal.api as lp
from libpysal import examples
import geopandas as gpd

Data preparation and statistical analysis

link = examples.get_path('columbus.shp')
gdf = gpd.read_file(link)
y = gdf['HOVAL'].values
w = lp.Queen.from_dataframe(gdf)
w.transform = 'r'
mloc = esda.moran.Moran_Local(y, w)

Plotting methods

mloc.plot(gdf, 'HOVAL')
mloc.plot(gdf, 'HOVAL', p=0.05, region_column='POLYID', mask=['1', '2', '3'], quadrant=1)
plt.show()

A new addition to the existing plotting functionality is the use of **kwargs as a function parameter. This allows users to customise any splot plot or any pysal plotting method leveraging splot by passing in common key words which the underlying Matplotlib function accepts (eg. color, c, marker size, linewidth, etc.). By providing both, sensible defaults and the possibility to change the style of the visualisation according to the users needs, splot allows both for quick exploratory use and customisation for publication.

Lessons Learned: Writing Library Code

During the first half of GSoC I dove into the world of open source code and spatial statistics head first. I was not aware of the fact that writing code for a library included much more than knowing how to write code and how to use git. In this section I would like to share my experiences and tips for writing open source library code with you:

My workflow creating new functionality:

  1. It starts with the fun: write some code!
  2. Create a function from the code if that has not already happened. Usually I iterate on a function until I am happy about the visual output. I therefore use .py files for my source code and Jupyter Notebooks to check the generated visualisations. Jupyter Lab for example provides a fantastic tool to work on both at once.
  3. After drafting the main functionality, I am thinking about the elements that should be made configurable for a user, such as color, design elements, masking options, and implement my ideas. 
  4. When the functionality is thought out and ready to be used, I write a doc-string containing parameters, returns and examples. This will later be used by Sphinx to create the complete documentation for the package.
  5. Now, I write a unit test that exercises the function and ensures that it keeps working when functionality is added or changed in future. These unit tests are run on Travis ci whenever someone makes changes to splot in a pull request.
  6. Now my functionality is ready to meet splot. I make a pull request and check that TravisCI gives me a green light. 

Tips to optimise this workflow which saved me a lot of time and helped debug my library code:

    • While working in a Jupyter notebook or Jupyter lab using code from a Python file, Jupyter does not pick up any changes once it imported a function. This slows down the process of iteratively changing a function and checking the resulting visualisation output.  Using reload allowed me to make edits in the Python file and quickly checking the resulting changes in Jupyter without restarting the entire kernel.
from importlib import reload
reload(splot)
  • Running pyflakes on the project repository in the terminal help me to clean up my code when it got a little messy. pyflakes picks up on loads of common errors like missing imports, unused imports, unused variables, etc allowing for a quick clean-up before making a pull request.
  • Similarly, running pep8 makes sure my code and documentation conforms to the Python style guide
  • Applying nosetests . in my local project folder, runs all the unit tests in the same way as tests are run on Travis CI. This provides an additional check if changes to the code are correct, before I commit or make a pull request.
  •  Failing tests on TravisCI can have may reasons. Running nosetests helps to check if my code is the problem or if for example the .travis.yml file needs to be updated (e.g. missing imports for new functionality). This file basically tells TravisCI what packages to install and how to run the test. When TravisCI is still failing, check the log which shows you the details why Travis testing failed. If it gets really tricky you can recreate the TravisCI testing step by step by executing different steps in the .yml file locally which will give you an indication when things start to go wrong.

Leave a Reply

Your email address will not be published. Required fields are marked *