GSoC Milestone 1: Visualising Spatial Autocorrelation

Exploratory Analysis of spatial data – Visualising Spatial Autocorrelation

Maps can be powerful tools to analyze attribute patterns in space. You could for example ask the questions: If you are likely to donate money for a good cause, is your neighbor as well? Or where are the greediest people living? Is there a connection between the willingness to donate money for a good cause and the geographical space you live in? Or is it just all random if people donate or not? Lastly, you would like your neighbors to have the same intrinsic ideas to share their wealth with others, so where should you move to, to make sure you end up in the right place? Thanks to GSoC the PySAL development team and I were able to create a visualization package called splot, that helps us to answer these questions.

A visual inspection of a Choropleth map (a map with the containing polygons colored according to their attribute value), showing how much money people in France donated for a good cause on a yearly average in the 1830s, allows us to already see spatial structure (data: Guerry). If the values would be completely random, we would see no dark or light clusters on the map:

# imports you will need:
import matplotlib.pyplot as plt
from bokeh.io import show, output_notebook
import libpysal.api as lp
import geopandas as gpd
import esda
from libpysal import examples

# load example data
link_to_data = examples.get_path('Guerry.shp')
df = gpd.read_file(link_to_data)

# calculate Local Moran Statistics for 'Donatns'
y = df['Donatns'].values
w = lp.Queen.from_dataframe(df) # Queen weights
w.transform = 'r'
moran_loc = esda.moran.Moran_Local(y, w)

# load Splot Choropleth functionality and plot
from splot.bk import plot_choropleth
fig = plot_choropleth(df, 'Donatns', reverse_colors=True)
show(fig)

Your brains can hereby help us identify these clusters. But careful, sometimes we tend to detect patterns where there is no statistical correlation between the values of neighboring polygons. This can especially be the case when these polygons are of different shapes and sizes. In order to make it a bit easier for our brain to detect clusters where there are statistically significant clusters, we can use Local Spatial Autocorrelation Statistics to identify so called hot and cold-spots on the map. There are many different methods of Local Spatial Autocorrelation Statistics. However, they all combine the idea of calculating two similarities: The similarity of space and the similarity of a certain attribute. (Note: we won’t dive too deep into Spatial Autocorrelation, but if you are interested in it I can highly recommend to check out the geopython tutorial offered by PySAL’s development team.)
In our case we can simply use PySAL and Esda to calculate Local Moran values. With Splot we can now plot the resulting Moran Scatterplot in combination with a LISA cluster map, indicating hot and cold spots as well as outliers, and the original Choropleth showcasing the values:

# Load splot plot_local_autocorrelation() functionality and plot
from splot.mpl import plot_local_autocorrelation
fig = plot_local_autocorrelation(moran_loc, df, "Donatns", legend=True)
plt.show()

Let’s assume further, you have now picked a particular region you have heard has beautiful nature and you would like to check locally, if people there are statistically more likely to donate larger sums. You can simply use two masking options in splot and the plot_local_autocorrelation() function in order to find out how your favorite Region “Ain” is doing (region-masking) and where all other regions with similar statistical values can be found (quadrant-masking):

# use plot_local_autocorrelation() with mask options
fig = plot_local_autocorrelation(moran_loc, df, "Donatns", legend=True, region_column='Dprtmnt', mask=['Rhone', 'Ain', 'Hautes-Alpes'], quadrant=3)
plt.show()

Lastly, you have discovered it is actually way more important to you if your neighbors are likely to share a glass of good French red wine with you in the evening instead of how much they donate. No problem, you can use any other geopandas dataframe, e.g. containing information about the wine consumption per year per region, and repeat the analysis.

The code above uses Matplotlib as the main plotting library, you can however also use our interactive Bokeh version.

Note: Since we are still in the development phase of splot, this relies for now on the master branch of geopandas, on a pull request to libpysal and the main pull request to splot.

from GSoC import splot

This blog post will introduce you to the splot package, how it is designed and what exactly I will be working on during the Google Summer of Code

What is splot?

The goal of the splot package is to meet the growing demand for a simple to use, lightweight interface that connects PySAL to different popular visualization toolkits like bokeh, matplotlib or folium. The splot package will ultimately provide users with both, static plots ready for publication and interactive visualizations that allow for quick iteration over ideas and data exploration. Please visit the viz module website for more detailed information and first examples for a possible design of such a package.

Design and components of splot

The splot package is ultimately structured into three levels. The highest level directly provides visualization functions for end users. Two lower layers are setting the basis for easy visualization by first, converting PySAL geometries (polygon, line, shape) into Matplotlib geometries and second, allowing for subsetting (e.g. plot only part of a .shp), aligning (e.g. same axes for different layers) and transforming (e.g. classify values to colors) graphical objects. So far the existing Moran plot provides a great example of how such functionality could look like.

GSoC project

Initial visualizations like LISA and choropleth maps were stated to be developed in the splot package but many functions remain to be coded. Besides refining existing plots, common views indicate that the Matplotlib interface needs to be extended by new maps (Join Count BW, regression maps), scatter plots (pairwise regression plots, …) and many more visualizations.

Next to providing these missing functional static plots the GSoC project will leverage more recent visualization technologies of the constantly evolving visualization space. This geovisualization project provides the scope to incorporate interactive visualizations developed in Jupyter within the splot package. It allows for exploration of potential new interfaces for alternative packages like Bokeh (plots with interactivity such as tooltips and zooming) and Folium (for plotting on top of web-sourced base layers, e.g. OpenStreetMap).

In the first phase of this project, we will therefore create different visualizations in both a static version with Matplotlib and an interactive version with Bokeh. Secondly, we will create a common API for easy access to both versions. After adding documentation we will be able to provide a complete and user friendly package. Finally, we will explore how alternative visualization packages, like Vega, could be integrated into the splot package in future.

Additionally, we will refactor the package to ensure all functionality and documentation can be accessed in the splot namespace and work towards its inclusion into the PySAL user guide.

Hello world

My name is Stefanie Lumnitz and I am excited about developing a geo-visualization package in 2018’s Google Summer of Code. In my daily life, I am a MSc. candidate at the University of British Columbia studying urban green spaces through deep neural networks. For GSoC, however, I will be joining the Python Software Foundation, in particular the PySAL development team and it’s open source community, in order to design and implement the splot package. In future, splot will provide a lightweight interface that connects PySAL to different popular visualization toolkits.

During this summer project I would like to share my experiences from GSoC, some insights into the PySAL community, tips on getting started with open source programming and of course some great geo-spatial visualizations with you.