Designing the splot API

The last two weeks of GSoC with PySAL and `splot` were all about designing the splot package structure and creating a common API. The decisions made during this time provide a preliminary blueprint for all visualisations and functionality to come.

In this blogpost, I will provide a summary of the decisions that were made in the process. If you are interested in having a closer look, the discussions between developers, the community and student are openly accessible in our GSoC 2018 project on the splot repository.  A collection of the functionality that will be supported in future can be found here and the decisions how the package structure and API will look like will be made here.

The splot package structure and API (as it currently stands):

In future splot‘s functionality can be accessed in two ways.

First of all, basic splot visualizations are exposed as .plotmethods on PySAL objects. For example:

from giddy.directional import Rose

...(Data preparation)
rose = Rose(Y, w)
rose.plot_heatmap()
plt.show()

Furthermore, all basic visualisations and more (to be defined) can be called with sub-package namespaces. For example:

from splot.giddy import dynamic_lisa_composite
from splot.esda import ...

Lastly, the majority of all PySAL developers prefers Matplotlib as the default backend. I therefore decided to focus on implementing most visualisations in a Matplotlib version first before continuing to develop the Bokeh backend.

Why we decided to prioritise matplotlib over bokeh

I found out over the last weeks, that it took much more of my time to implement a visualisation in bokeh rather than in matplotlib. I can think of four main reasons for this:

  1. I am more familiar with Matplotlib and naturally quicker in creating visualisations with this backend.
  2. Matplotlib seems to have better documentation and more examples on stackoverflow and many other places which speeds up the implementation of specialised visualisations.
  3. I found a couple of things that were harder to implement in Bokeh, seemingly because of missing features and design choices.
  4. There are more packages that already build on Matplotlib, like Geopandas .plot functionality or Seaborn that can be leveraged.

When visualising geographical data in maps, it is for example important to autoscale plots to keep their aspect ratio when the plotted figure changes size or format. This helps avoid distortion in spatially explicit representations.

To achieve this in matplotlib one can simply add the line:

ax.set_aspect('equal')

In Bokeh however that did not seem to be implemented yet, so I had to create this utility function:

def calc_data_aspect(plot_height, plot_width, bounds):
  # Deal with data ranges in Bokeh:
  # make a meter in x and y the same in pixel lengths
  aspect_box = plot_height / plot_width # 2 / 1 = 2
  xmin, ymin, xmax, ymax = bounds
  x_range = xmax - xmin # 1 = 1 - 0
  y_range = ymax - ymin # 3 = 3 - 0
  aspect_data = y_range / x_range # 3 / 1 = 3

  if aspect_data > aspect_box:
    # we need to increase x_range,
    # such that aspect_data becomes equal to aspect_box
    halfrange = 0.5 * x_range * (aspect_data/aspect_box-1)
    # 0.5 * 1 * (3 / 2 - 1) = 0.25
    xmin -= halfrange # 0 - 0.25 = -0.25
    xmax += halfrange # 1 + 0.25 = 1.25
  else:
    # we need to increase y_range
    halfrange = 0.5 * y_range * (aspect_box/aspect_data-1)
    ymin -= halfrange
    ymax += halfrange

  # Add a bit of margin to both x and y
  margin = 0.03
  xmin -= (xmax - xmin) / 2 * margin
  xmax += (xmax - xmin) / 2 * margin
  ymin -= (ymax - ymin) / 2 * margin
  ymax += (ymax - ymin) / 2 * margin
  return xmin, xmax, ymin, ymax

It was also easier for me to use  Interact Jupyter notebook widgets with Matplotlib rather than Bokeh. In Matplotlib Interact simply regenerates the whole figure and was easy to implement:

interact(_dynamic_lisa_widget_update,
         timex=coldict, timey=coldict, rose=fixed(rose),
         gdf=fixed(gdf), p=fixed(p), figsize=fixed(figsize)
         )

With Bokeh as a backend one needs to update the datasource rather than the figure. This is much faster (the whole figure does not have to be re-drawn), but it is harder to implement because the datasource is hidden inside the previous plotting functions and therefore not directly accessible by Interact.

In conclusion, Bokeh generally is a great tool to make one off customised visualisations but generally seems to not fully support implementing library functions easily yet.

Another reason is that using Bokeh in Jupyter Lab required to load a Jupyter Lab extension which for some reason did not build for me. Nevertheless, Bokeh is still a relatively young package with great potential and I will definitely explore it and other more interactive visualisation libraries later in GSoC.

Example of API usage

Now that the preliminary API design is chosen, I started to implement functions according to this pattern. This pull request contains five first functions and their documentation for giddy.directional. This is an example of the end result for one of the five:

Leave a Reply

Your email address will not be published. Required fields are marked *