from scipy2018 import PySAL==2.0rc2

Returning from scipy 2018

My wrap-up from an exciting week full of stories about code and community at scipy 2018. First: I am even more convinced to keep on working on open source software in a fantastic community where scipy conference t-shirts are worn somewhat similar to festival t-shirts. Second: I was very impressed by the amount of new ideas, creative solutions and jokes (reference to the ‘catterplot’ and all lightning-talks) to encounter at the conference. Lastly, Dani asked me during the end of the sprints, “What changed your life this week?” – “Of course meeting the pysal team, my mentors and experiencing the spirit of coding together to finish the release of splot!”

I also had the change to introduce splot to the broader python community in a lightning talk, which you can see if you follow the link.

First splot release

This big announcement this week is that we successfully preliminary released splot as part of PySAL 2.0rc2.

You can install and access splot via PySAL 2.0:

pip install PySAL==2.0rc2
You can download PySAL-2.0 file from pypi.org. Release notes and some statistics about PySAL - 2.0 can be accessed here. More information about migrating to PySAL 2.0 can be found here. And our brand new team website which can be accessed here (which is partly still in the making).

Extended functionality

Next to these exciting news, I have continued extending and fine-tuning splot‘s functionality to make it more user friendly. For example, you can now use splot.esda.moran_scatterplot() to plot all esda.moranobjects, instead of calling functions specific to Moran, Moran_Local, Moran_BV, ….
from splot.esda import moran_scatterplot

fig, axs = plt.subplots(2, 2, figsize=(10,10),
                        subplot_kw={'aspect': 'equal'})

moran_scatterplot(moran, p=0.05, ax=axs[0,0])
moran_scatterplot(moran_loc, p=0.05, ax=axs[1,0])
moran_scatterplot(moran_bv, p=0.05, ax=axs[0,1])
moran_scatterplot(moran_loc_bv, p=0.05, ax=axs[1,1])
plt.show()
 
Furthermore, I implemented moran_facet() which allows to plot moran statistics calculated for a variety of attributes:
from splot.esda import moran_facet

fig, axarr = moran_facet(moran_matrix)
plt.show()

Designing the splot API

The last two weeks of GSoC with PySAL and `splot` were all about designing the splot package structure and creating a common API. The decisions made during this time provide a preliminary blueprint for all visualisations and functionality to come.

In this blogpost, I will provide a summary of the decisions that were made in the process. If you are interested in having a closer look, the discussions between developers, the community and student are openly accessible in our GSoC 2018 project on the splot repository.  A collection of the functionality that will be supported in future can be found here and the decisions how the package structure and API will look like will be made here.

The splot package structure and API (as it currently stands):

In future splot‘s functionality can be accessed in two ways.

First of all, basic splot visualizations are exposed as .plotmethods on PySAL objects. For example:

from giddy.directional import Rose

...(Data preparation)
rose = Rose(Y, w)
rose.plot_heatmap()
plt.show()

Furthermore, all basic visualisations and more (to be defined) can be called with sub-package namespaces. For example:

from splot.giddy import dynamic_lisa_composite
from splot.esda import ...

Lastly, the majority of all PySAL developers prefers Matplotlib as the default backend. I therefore decided to focus on implementing most visualisations in a Matplotlib version first before continuing to develop the Bokeh backend.

Why we decided to prioritise matplotlib over bokeh

I found out over the last weeks, that it took much more of my time to implement a visualisation in bokeh rather than in matplotlib. I can think of four main reasons for this:

  1. I am more familiar with Matplotlib and naturally quicker in creating visualisations with this backend.
  2. Matplotlib seems to have better documentation and more examples on stackoverflow and many other places which speeds up the implementation of specialised visualisations.
  3. I found a couple of things that were harder to implement in Bokeh, seemingly because of missing features and design choices.
  4. There are more packages that already build on Matplotlib, like Geopandas .plot functionality or Seaborn that can be leveraged.

When visualising geographical data in maps, it is for example important to autoscale plots to keep their aspect ratio when the plotted figure changes size or format. This helps avoid distortion in spatially explicit representations.

To achieve this in matplotlib one can simply add the line:

ax.set_aspect('equal')

In Bokeh however that did not seem to be implemented yet, so I had to create this utility function:

def calc_data_aspect(plot_height, plot_width, bounds):
  # Deal with data ranges in Bokeh:
  # make a meter in x and y the same in pixel lengths
  aspect_box = plot_height / plot_width # 2 / 1 = 2
  xmin, ymin, xmax, ymax = bounds
  x_range = xmax - xmin # 1 = 1 - 0
  y_range = ymax - ymin # 3 = 3 - 0
  aspect_data = y_range / x_range # 3 / 1 = 3

  if aspect_data > aspect_box:
    # we need to increase x_range,
    # such that aspect_data becomes equal to aspect_box
    halfrange = 0.5 * x_range * (aspect_data/aspect_box-1)
    # 0.5 * 1 * (3 / 2 - 1) = 0.25
    xmin -= halfrange # 0 - 0.25 = -0.25
    xmax += halfrange # 1 + 0.25 = 1.25
  else:
    # we need to increase y_range
    halfrange = 0.5 * y_range * (aspect_box/aspect_data-1)
    ymin -= halfrange
    ymax += halfrange

  # Add a bit of margin to both x and y
  margin = 0.03
  xmin -= (xmax - xmin) / 2 * margin
  xmax += (xmax - xmin) / 2 * margin
  ymin -= (ymax - ymin) / 2 * margin
  ymax += (ymax - ymin) / 2 * margin
  return xmin, xmax, ymin, ymax

It was also easier for me to use  Interact Jupyter notebook widgets with Matplotlib rather than Bokeh. In Matplotlib Interact simply regenerates the whole figure and was easy to implement:

interact(_dynamic_lisa_widget_update,
         timex=coldict, timey=coldict, rose=fixed(rose),
         gdf=fixed(gdf), p=fixed(p), figsize=fixed(figsize)
         )

With Bokeh as a backend one needs to update the datasource rather than the figure. This is much faster (the whole figure does not have to be re-drawn), but it is harder to implement because the datasource is hidden inside the previous plotting functions and therefore not directly accessible by Interact.

In conclusion, Bokeh generally is a great tool to make one off customised visualisations but generally seems to not fully support implementing library functions easily yet.

Another reason is that using Bokeh in Jupyter Lab required to load a Jupyter Lab extension which for some reason did not build for me. Nevertheless, Bokeh is still a relatively young package with great potential and I will definitely explore it and other more interactive visualisation libraries later in GSoC.

Example of API usage

Now that the preliminary API design is chosen, I started to implement functions according to this pattern. This pull request contains five first functions and their documentation for giddy.directional. This is an example of the end result for one of the five:

GSoC Milestone 1: Visualising Spatial Autocorrelation

Exploratory Analysis of spatial data – Visualising Spatial Autocorrelation

Maps can be powerful tools to analyze attribute patterns in space. You could for example ask the questions: If you are likely to donate money for a good cause, is your neighbor as well? Or where are the greediest people living? Is there a connection between the willingness to donate money for a good cause and the geographical space you live in? Or is it just all random if people donate or not? Lastly, you would like your neighbors to have the same intrinsic ideas to share their wealth with others, so where should you move to, to make sure you end up in the right place? Thanks to GSoC the PySAL development team and I were able to create a visualization package called splot, that helps us to answer these questions.

A visual inspection of a Choropleth map (a map with the containing polygons colored according to their attribute value), showing how much money people in France donated for a good cause on a yearly average in the 1830s, allows us to already see spatial structure (data: Guerry). If the values would be completely random, we would see no dark or light clusters on the map:

# imports you will need:
import matplotlib.pyplot as plt
from bokeh.io import show, output_notebook
import libpysal.api as lp
import geopandas as gpd
import esda
from libpysal import examples

# load example data
link_to_data = examples.get_path('Guerry.shp')
df = gpd.read_file(link_to_data)

# calculate Local Moran Statistics for 'Donatns'
y = df['Donatns'].values
w = lp.Queen.from_dataframe(df) # Queen weights
w.transform = 'r'
moran_loc = esda.moran.Moran_Local(y, w)

# load Splot Choropleth functionality and plot
from splot.bk import plot_choropleth
fig = plot_choropleth(df, 'Donatns', reverse_colors=True)
show(fig)

Your brains can hereby help us identify these clusters. But careful, sometimes we tend to detect patterns where there is no statistical correlation between the values of neighboring polygons. This can especially be the case when these polygons are of different shapes and sizes. In order to make it a bit easier for our brain to detect clusters where there are statistically significant clusters, we can use Local Spatial Autocorrelation Statistics to identify so called hot and cold-spots on the map. There are many different methods of Local Spatial Autocorrelation Statistics. However, they all combine the idea of calculating two similarities: The similarity of space and the similarity of a certain attribute. (Note: we won’t dive too deep into Spatial Autocorrelation, but if you are interested in it I can highly recommend to check out the geopython tutorial offered by PySAL’s development team.)
In our case we can simply use PySAL and Esda to calculate Local Moran values. With Splot we can now plot the resulting Moran Scatterplot in combination with a LISA cluster map, indicating hot and cold spots as well as outliers, and the original Choropleth showcasing the values:

# Load splot plot_local_autocorrelation() functionality and plot
from splot.mpl import plot_local_autocorrelation
fig = plot_local_autocorrelation(moran_loc, df, "Donatns", legend=True)
plt.show()

Let’s assume further, you have now picked a particular region you have heard has beautiful nature and you would like to check locally, if people there are statistically more likely to donate larger sums. You can simply use two masking options in splot and the plot_local_autocorrelation() function in order to find out how your favorite Region “Ain” is doing (region-masking) and where all other regions with similar statistical values can be found (quadrant-masking):

# use plot_local_autocorrelation() with mask options
fig = plot_local_autocorrelation(moran_loc, df, "Donatns", legend=True, region_column='Dprtmnt', mask=['Rhone', 'Ain', 'Hautes-Alpes'], quadrant=3)
plt.show()

Lastly, you have discovered it is actually way more important to you if your neighbors are likely to share a glass of good French red wine with you in the evening instead of how much they donate. No problem, you can use any other geopandas dataframe, e.g. containing information about the wine consumption per year per region, and repeat the analysis.

The code above uses Matplotlib as the main plotting library, you can however also use our interactive Bokeh version.

Note: Since we are still in the development phase of splot, this relies for now on the master branch of geopandas, on a pull request to libpysal and the main pull request to splot.