Value-by-Alpha maps with `splot`

The last two  weeks of GSoC were still fully dedicated to expanding plots functionality. You can for example now create so called Value-by-Alpha maps using splot.mapping.

What is a Value by Alpha choropleth?

In a nutshell, a Value-by-Alpha Choropleth is a bivariate choropleth that uses the values of the second input variable y as a transparency mask, determining how much of the choropleth displaying the values of a first variable x is shown. In comparison to a cartogram, Value-By-Alpha choropleths will not distort shapes and sizes but modify the alpha channel (transparency) of polygons according to the second input variable y.

Value-by-Alpha functionality in splot

Imports you will need

import libpysal as lp
from libpysal import examples
import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib
import numpy as np

from splot.mapping import vba_choropleth

Let’s prepare the data

Load example data into a geopandas.GeoDataFrame and inspect column names. In this example we will use the columbus.shp file containing neighborhood crime data of 1980.

link_to_data = examples.get_path('columbus.shp')
gdf = gpd.read_file(link_to_data)

We extract two arrays x (housing value (in $1,000)) and y (residential burglaries and vehicle thefts per 1000 households).

x = gdf['HOVAL'].values
y = gdf['CRIME'].values

We can now create a value by alpha map using `splot`’s `vba_choropleth` functionality.

Let’s plot a Value-by-Alpha Choropleth with `x` defining the rgb values and `y` defining the alpha value. For comparison we plot a choropleth of `x` with `gdf.plot()`:

# Create new figure
fig, axs = plt.subplots(1,2, figsize=(20,10))

# use gdf.plot() to create regular choropleth
gdf.plot(column='HOVAL', scheme='quantiles', cmap='RdBu', ax=axs[0])

# use vba_choropleth to create Value-by-Alpha Choropleth
vba_choropleth(x, y, gdf, rgb_mapclassify=dict(classifier='quantiles'),
               cmap='RdBu', ax=axs[1])

# set figure style
axs[0].set_title('normal Choropleth')
axs[1].set_title('Value-by-Alpha Choropleth')

# plot

You can see the original choropleth is fading into transparency wherever there is a high `y` value.

You can use the option to bin or classify your `x` and `y` values. `splot` uses mapclassify to bin your data and displays the new color and alpha ranges:

# Create new figure
fig, axs = plt.subplots(2,2, figsize=(20,10))

# classifier quantiles
vba_choropleth(y, x, gdf, cmap='viridis', ax = axs[0,0],
               rgb_mapclassify=dict(classifier='quantiles', k=3), 
               alpha_mapclassify=dict(classifier='quantiles', k=3))

# classifier natural_breaks
vba_choropleth(y, x, gdf, cmap='viridis', ax = axs[0,1],

# classifier std_mean
vba_choropleth(y, x, gdf, cmap='viridis', ax = axs[1,0],

# classifier fisher_jenks
vba_choropleth(y, x, gdf, cmap='viridis', ax = axs[1,1],
               rgb_mapclassify=dict(classifier='fisher_jenks', k=3), 
               alpha_mapclassify=dict(classifier='fisher_jenks', k=3))

Sometimes it is important in geospatial analysis to actually see the high values and let the small values fade out. With the `revert_alpha = True` argument, you can revert the transparency of the `y` values.

# Create new figure
fig, axs = plt.subplots(1,2, figsize=(20,10))

# create a vba_choropleth
vba_choropleth(x, y, gdf, rgb_mapclassify=dict(classifier='quantiles'),
               cmap='RdBu', ax=axs[0],

# set revert_alpha argument to True
vba_choropleth(x, y, gdf, rgb_mapclassify=dict(classifier='quantiles'),
               cmap='RdBu', ax=axs[1],
               revert_alpha = True)

# set figure style
axs[0].set_title('revert_alpha = False')
axs[1].set_title('revert_alpha = True')

# plot

You can use the divergent argument to display divergent alpha values. This means values at the extremes of your data range will be displayed with an alpha value of 1. Values towards the middle of your data range will be mapped more and more invisible towards an alpha value of 0.

# create new figure
fig, axs = plt.subplots(1,2, figsize=(20,10))

# create a vba_choropleth
vba_choropleth(y, x, gdf, cmap='RdBu',
               divergent=False, ax=axs[0])

# set divergent to True
vba_choropleth(y, x, gdf, cmap='RdBu',
               divergent=True, ax=axs[1])

# set figure style
axs[0].set_title('divergent = False')
axs[1].set_title('divergent = True')

# plot

Lastly, if your values are classified, you have the option to add a legend to your map:

fig = plt.figure(figsize=(15,10))
ax = fig.add_subplot(111)
vba_choropleth(x, y, gdf,
               alpha_mapclassify=dict(classifier='quantiles', k=5),
               rgb_mapclassify=dict(classifier='quantiles', k=5),
               legend=True, ax=ax)

Since `PySAL` is currently being refactored, merging and cleaning PRs has become a little more challenging. This new functionality still depends on PR #28, which will be finalised soon.

Outlook to the last week of GSoC

Next week will already be the last week of GSoC. I am sad that this great summer is over, but am looking forward to keeping on working on splot and PySAL in future. Look out for my next blog post in which I will summarise all of my code and decisions made for the splot project.

from scipy2018 import PySAL==2.0rc2

Returning from scipy 2018

My wrap-up from an exciting week full of stories about code and community at scipy 2018. First: I am even more convinced to keep on working on open source software in a fantastic community where scipy conference t-shirts are worn somewhat similar to festival t-shirts. Second: I was very impressed by the amount of new ideas, creative solutions and jokes (reference to the ‘catterplot’ and all lightning-talks) to encounter at the conference. Lastly, Dani asked me during the end of the sprints, “What changed your life this week?” – “Of course meeting the pysal team, my mentors and experiencing the spirit of coding together to finish the release of splot!”

I also had the change to introduce splot to the broader python community in a lightning talk, which you can see if you follow the link.

First splot release

This big announcement this week is that we successfully preliminary released splot as part of PySAL 2.0rc2.

You can install and access splot via PySAL 2.0:

pip install PySAL==2.0rc2
You can download PySAL-2.0 file from Release notes and some statistics about PySAL - 2.0 can be accessed here. More information about migrating to PySAL 2.0 can be found here. And our brand new team website which can be accessed here (which is partly still in the making).

Extended functionality

Next to these exciting news, I have continued extending and fine-tuning splot‘s functionality to make it more user friendly. For example, you can now use splot.esda.moran_scatterplot() to plot all esda.moranobjects, instead of calling functions specific to Moran, Moran_Local, Moran_BV, ….
from splot.esda import moran_scatterplot

fig, axs = plt.subplots(2, 2, figsize=(10,10),
                        subplot_kw={'aspect': 'equal'})

moran_scatterplot(moran, p=0.05, ax=axs[0,0])
moran_scatterplot(moran_loc, p=0.05, ax=axs[1,0])
moran_scatterplot(moran_bv, p=0.05, ax=axs[0,1])
moran_scatterplot(moran_loc_bv, p=0.05, ax=axs[1,1])
Furthermore, I implemented moran_facet() which allows to plot moran statistics calculated for a variety of attributes:
from splot.esda import moran_facet

fig, axarr = moran_facet(moran_matrix)

Milestone 2: Sprinting towards an`splot` release

I am happy to announce that a first experimental release of splot is near! The whole mentoring and PySAL development team including the GSoC student, me, will be meeting at SciPy 2018 to prepare a common release of all PySAL sub-packages. You will find us coding together at lunch, in coffee breaks and during the sprints at the end of the conference to get the release ready by next weekend.

From first steps to mid sprint

After the decision was made how the API should look like and the focus was set on the implementation of views in matplotlib, I was busy creating and implementing new visualisations for esda and libpysal.

Levi John Wolf, recently created libpysal functionality that allows to “snap” neighbouring polygons back together, to correct incorrectly separated nodes and edges, stemming from data digitisation errors. This error of “non-touching” polygons is common and needs to be corrected for spatial analysis. A typical workflow to assess this error using esda and splot could look like this:

First we import all necessary packages.

import libpysal.api as lp
import libpysal
from libpysal import examples
import matplotlib.pyplot as plt
import geopandas as gpd
from splot.libpysal import plot_spatial_weights

Second, we load the data we want to assess into a geopandas dataframe and calculate spatial weights. (We will use existing `libpysal.example` data.)

gdf = gpd.read_file(libpysal.examples.get_path('43MUE250GC_SIR.shp'))

weights = lp.Queen.from_dataframe(gdf)

libpysal  automatically warns us if our dataset contains islands. Islands are polygons that do not share edges and nodes with adjacent polygones. This can for example be the case if polygons are truly not neighbouring, eg. when two land parcels are seperated by a river. However, these islands often stems from human error when digitizing features into polygons.

/Users/steffie/code/libpysal/libpysal/weights/ UserWarning: There are 30 disconnected observations
  warnings.warn("There are %d disconnected observations" % ni)
/Users/steffie/code/libpysal/libpysal/weights/ UserWarning: Island ids: 0, 1, 5, 24, 28, 81, 95, 102, 108, 110, 120, 123, 140, 170, 176, 224, 240, 248, 254, 255, 256, 257, 262, 277, 292, 295, 304, 322, 358, 375
  warnings.warn("Island ids: %s" % ', '.join(str(island) for island in self.islands))

This unwanted error can now be assessed using splot.libpysal.plot_spatial_weights functionality:

plot_spatial_weights(weights, gdf)

This visualisation depicts the spatial weights network, a network of connections of the centroid of each polygon to the centroid of its neighbour. As we can see, there are many polygons in the south and west of this map, that are not connected to its neighbors. We can use libpysal.weights.util.nonplanar_neighbors to correct this error and visualise the result with splot.libpysal.plot_spatial_weights.

wnp = libpysal.weights.util.nonplanar_neighbors(weights, gdf)

plot_spatial_weights(wnp, gdf)

As we can see, all erroneous islands are now stored as neighbors in our new weights object, depicted by the new joins displayed in orange. This example and more ca be tested by users via splot‘s extended documentation in jupyter notebooks.

From mid sprint to full sprint

Additionally, the splot dev team has started to reach out to the geopandas and Yellowbrick dev teams in order to share knowledge and collaborate. splot functionality will depend on a data input as geopandas dataframe in future. Therefore we would like to start a collaboration on eventually coordinated release dates and potential joint visualisation projects, which will be discussed this week. Results and ideas will be collected in this issue.

Yellowbrick extends the Scikit-Learn API with diagnostic visualisations to steer machine learning processes. Its mission to extend an existing API by offering visualisations that can steer a data analysis process is very close to splot. Since Yellowbrick is already an established and popular package and seems to have had similar decisions to make, we decided to contact its dev team and are looking forward to a conversation in two weeks.

See you all at SciPy 2018!