I am happy to announce that a first experimental release of
splot is near! The whole mentoring and
PySAL development team including the GSoC student, me, will be meeting at SciPy 2018 to prepare a common release of all
PySAL sub-packages. You will find us coding together at lunch, in coffee breaks and during the sprints at the end of the conference to get the release ready by next weekend.
From first steps to mid sprint
After the decision was made how the API should look like and the focus was set on the implementation of views in
matplotlib, I was busy creating and implementing new visualisations for
Levi John Wolf, recently created
libpysal functionality that allows to “snap” neighbouring polygons back together, to correct incorrectly separated nodes and edges, stemming from data digitisation errors. This error of “non-touching” polygons is common and needs to be corrected for spatial analysis. A typical workflow to assess this error using
splot could look like this:
First we import all necessary packages.
import libpysal.api as lp
from libpysal import examples
import matplotlib.pyplot as plt
import geopandas as gpd
from splot.libpysal import plot_spatial_weights
Second, we load the data we want to assess into a geopandas dataframe and calculate spatial weights. (We will use existing `libpysal.example` data.)
gdf = gpd.read_file(libpysal.examples.get_path('43MUE250GC_SIR.shp'))
weights = lp.Queen.from_dataframe(gdf)
libpysal automatically warns us if our dataset contains islands. Islands are polygons that do not share edges and nodes with adjacent polygones. This can for example be the case if polygons are truly not neighbouring, eg. when two land parcels are seperated by a river. However, these islands often stems from human error when digitizing features into polygons.
/Users/steffie/code/libpysal/libpysal/weights/weights.py:189: UserWarning: There are 30 disconnected observations
warnings.warn("There are %d disconnected observations" % ni)
/Users/steffie/code/libpysal/libpysal/weights/weights.py:190: UserWarning: Island ids: 0, 1, 5, 24, 28, 81, 95, 102, 108, 110, 120, 123, 140, 170, 176, 224, 240, 248, 254, 255, 256, 257, 262, 277, 292, 295, 304, 322, 358, 375
warnings.warn("Island ids: %s" % ', '.join(str(island) for island in self.islands))
This unwanted error can now be assessed using
This visualisation depicts the spatial weights network, a network of connections of the centroid of each polygon to the centroid of its neighbour. As we can see, there are many polygons in the south and west of this map, that are not connected to its neighbors. We can use
libpysal.weights.util.nonplanar_neighbors to correct this error and visualise the result with
wnp = libpysal.weights.util.nonplanar_neighbors(weights, gdf)
As we can see, all erroneous islands are now stored as neighbors in our new weights object, depicted by the new joins displayed in orange. This example and more ca be tested by users via
splot‘s extended documentation in jupyter notebooks.
From mid sprint to full sprint
splot dev team has started to reach out to the
Yellowbrick dev teams in order to share knowledge and collaborate.
splot functionality will depend on a data input as
geopandas dataframe in future. Therefore we would like to start a collaboration on eventually coordinated release dates and potential joint visualisation projects, which will be discussed this week. Results and ideas will be collected in this issue.
Yellowbrick extends the Scikit-Learn API with diagnostic visualisations to steer machine learning processes. Its mission to extend an existing API by offering visualisations that can steer a data analysis process is very close to
Yellowbrick is already an established and popular package and seems to have had similar decisions to make, we decided to contact its dev team and are looking forward to a conversation in two weeks.
See you all at SciPy 2018!