## Value-by-Alpha maps with `splot`

The last two  weeks of GSoC were still fully dedicated to expanding `plots` functionality. You can for example now create so called Value-by-Alpha maps using `splot.mapping`.

### What is a Value by Alpha choropleth?

In a nutshell, a Value-by-Alpha Choropleth is a bivariate choropleth that uses the values of the second input variable `y` as a transparency mask, determining how much of the choropleth displaying the values of a first variable `x` is shown. In comparison to a cartogram, Value-By-Alpha choropleths will not distort shapes and sizes but modify the alpha channel (transparency) of polygons according to the second input variable `y`.

### Value-by-Alpha functionality in `splot`

Imports you will need

```import libpysal as lp
from libpysal import examples
import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib
import numpy as np

from splot.mapping import vba_choropleth```

Let’s prepare the data

Load example data into a `geopandas.GeoDataFrame` and inspect column names. In this example we will use the `columbus.shp` file containing neighborhood crime data of 1980.

```link_to_data = examples.get_path('columbus.shp')

We extract two arrays `x` (housing value (in \$1,000)) and `y` (residential burglaries and vehicle thefts per 1000 households).

```x = gdf['HOVAL'].values
y = gdf['CRIME'].values```

We can now create a value by alpha map using `splot`’s `vba_choropleth` functionality.

Let’s plot a Value-by-Alpha Choropleth with `x` defining the rgb values and `y` defining the alpha value. For comparison we plot a choropleth of `x` with `gdf.plot()`:

```# Create new figure
fig, axs = plt.subplots(1,2, figsize=(20,10))

# use gdf.plot() to create regular choropleth
gdf.plot(column='HOVAL', scheme='quantiles', cmap='RdBu', ax=axs[0])

# use vba_choropleth to create Value-by-Alpha Choropleth
vba_choropleth(x, y, gdf, rgb_mapclassify=dict(classifier='quantiles'),
alpha_mapclassify=dict(classifier='quantiles'),
cmap='RdBu', ax=axs[1])

# set figure style
axs[0].set_title('normal Choropleth')
axs[0].set_axis_off()
axs[1].set_title('Value-by-Alpha Choropleth')

# plot
plt.show()```

You can see the original choropleth is fading into transparency wherever there is a high `y` value.

You can use the option to bin or classify your `x` and `y` values. `splot` uses mapclassify to bin your data and displays the new color and alpha ranges:

```# Create new figure
fig, axs = plt.subplots(2,2, figsize=(20,10))

# classifier quantiles
vba_choropleth(y, x, gdf, cmap='viridis', ax = axs[0,0],
rgb_mapclassify=dict(classifier='quantiles', k=3),
alpha_mapclassify=dict(classifier='quantiles', k=3))

# classifier natural_breaks
vba_choropleth(y, x, gdf, cmap='viridis', ax = axs[0,1],
rgb_mapclassify=dict(classifier='natural_breaks'),
alpha_mapclassify=dict(classifier='natural_breaks'))

# classifier std_mean
vba_choropleth(y, x, gdf, cmap='viridis', ax = axs[1,0],
rgb_mapclassify=dict(classifier='std_mean'),
alpha_mapclassify=dict(classifier='std_mean'))

# classifier fisher_jenks
vba_choropleth(y, x, gdf, cmap='viridis', ax = axs[1,1],
rgb_mapclassify=dict(classifier='fisher_jenks', k=3),
alpha_mapclassify=dict(classifier='fisher_jenks', k=3))

plt.show()```

Sometimes it is important in geospatial analysis to actually see the high values and let the small values fade out. With the `revert_alpha = True` argument, you can revert the transparency of the `y` values.

```# Create new figure
fig, axs = plt.subplots(1,2, figsize=(20,10))

# create a vba_choropleth
vba_choropleth(x, y, gdf, rgb_mapclassify=dict(classifier='quantiles'),
alpha_mapclassify=dict(classifier='quantiles'),
cmap='RdBu', ax=axs[0],
revert_alpha=False)

# set revert_alpha argument to True
vba_choropleth(x, y, gdf, rgb_mapclassify=dict(classifier='quantiles'),
alpha_mapclassify=dict(classifier='quantiles'),
cmap='RdBu', ax=axs[1],
revert_alpha = True)

# set figure style
axs[0].set_title('revert_alpha = False')
axs[1].set_title('revert_alpha = True')

# plot
plt.show()```

You can use the divergent argument to display divergent alpha values. This means values at the extremes of your data range will be displayed with an alpha value of 1. Values towards the middle of your data range will be mapped more and more invisible towards an alpha value of 0.

```# create new figure
fig, axs = plt.subplots(1,2, figsize=(20,10))

# create a vba_choropleth
vba_choropleth(y, x, gdf, cmap='RdBu',
divergent=False, ax=axs[0])

# set divergent to True
vba_choropleth(y, x, gdf, cmap='RdBu',
divergent=True, ax=axs[1])

# set figure style
axs[0].set_title('divergent = False')
axs[0].set_axis_off()
axs[1].set_title('divergent = True')

# plot
plt.show()```

Lastly, if your values are classified, you have the option to add a legend to your map:

```fig = plt.figure(figsize=(15,10))
vba_choropleth(x, y, gdf,
alpha_mapclassify=dict(classifier='quantiles', k=5),
rgb_mapclassify=dict(classifier='quantiles', k=5),
legend=True, ax=ax)
plt.show()```

Since `PySAL` is currently being refactored, merging and cleaning PRs has become a little more challenging. This new functionality still depends on PR #28, which will be finalised soon.

### Outlook to the last week of GSoC

Next week will already be the last week of GSoC. I am sad that this great summer is over, but am looking forward to keeping on working on `splot` and `PySAL` in future. Look out for my next blog post in which I will summarise all of my code and decisions made for the `splot` project.

## from scipy2018 import PySAL==2.0rc2

### Returning from scipy 2018

My wrap-up from an exciting week full of stories about code and community at scipy 2018. First: I am even more convinced to keep on working on open source software in a fantastic community where scipy conference t-shirts are worn somewhat similar to festival t-shirts. Second: I was very impressed by the amount of new ideas, creative solutions and jokes (reference to the ‘catterplot’ and all lightning-talks) to encounter at the conference. Lastly, Dani asked me during the end of the sprints, “What changed your life this week?” – “Of course meeting the pysal team, my mentors and experiencing the spirit of coding together to finish the release of `splot`!”

I also had the change to introduce `splot` to the broader python community in a lightning talk, which you can see if you follow the link.

### First splot release

This big announcement this week is that we successfully preliminary released `splot` as part of `PySAL 2.0rc2`.

You can install and access `splot` via `PySAL 2.0`:

`pip install PySAL==2.0rc2`
You can download `PySAL-2.0` file from pypi.org. Release notes and some statistics about `PySAL - 2.0` can be accessed here. More information about migrating to `PySAL 2.0` can be found here. And our brand new team website which can be accessed here (which is partly still in the making).

### Extended functionality

Next to these exciting news, I have continued extending and fine-tuning `splot`‘s functionality to make it more user friendly. For example, you can now use `splot.esda.moran_scatterplot()` to plot all `esda.moran`objects, instead of calling functions specific to `Moran`, `Moran_Local`, `Moran_BV`, ….
```from splot.esda import moran_scatterplot

fig, axs = plt.subplots(2, 2, figsize=(10,10),
subplot_kw={'aspect': 'equal'})

moran_scatterplot(moran, p=0.05, ax=axs[0,0])
moran_scatterplot(moran_loc, p=0.05, ax=axs[1,0])
moran_scatterplot(moran_bv, p=0.05, ax=axs[0,1])
moran_scatterplot(moran_loc_bv, p=0.05, ax=axs[1,1])
plt.show()```

Furthermore, I implemented `moran_facet()` which allows to plot moran statistics calculated for a variety of attributes:
```from splot.esda import moran_facet

fig, axarr = moran_facet(moran_matrix)
plt.show()```

## Milestone 2: Sprinting towards an`splot` release

I am happy to announce that a first experimental release of `splot` is near! The whole mentoring and `PySAL` development team including the GSoC student, me, will be meeting at SciPy 2018 to prepare a common release of all `PySAL` sub-packages. You will find us coding together at lunch, in coffee breaks and during the sprints at the end of the conference to get the release ready by next weekend.

### From first steps to mid sprint

After the decision was made how the API should look like and the focus was set on the implementation of views in `matplotlib`, I was busy creating and implementing new visualisations for `esda` and `libpysal`.

Levi John Wolf, recently created `libpysal` functionality that allows to “snap” neighbouring polygons back together, to correct incorrectly separated nodes and edges, stemming from data digitisation errors. This error of “non-touching” polygons is common and needs to be corrected for spatial analysis. A typical workflow to assess this error using `esda` and `splot` could look like this:

First we import all necessary packages.

```import libpysal.api as lp
import libpysal
from libpysal import examples
import matplotlib.pyplot as plt
import geopandas as gpd
from splot.libpysal import plot_spatial_weights
```

Second, we load the data we want to assess into a geopandas dataframe and calculate spatial weights. (We will use existing `libpysal.example` data.)

```gdf = gpd.read_file(libpysal.examples.get_path('43MUE250GC_SIR.shp'))

weights = lp.Queen.from_dataframe(gdf)
```

`libpysal`  automatically warns us if our dataset contains islands. Islands are polygons that do not share edges and nodes with adjacent polygones. This can for example be the case if polygons are truly not neighbouring, eg. when two land parcels are seperated by a river. However, these islands often stems from human error when digitizing features into polygons.

```/Users/steffie/code/libpysal/libpysal/weights/weights.py:189: UserWarning: There are 30 disconnected observations
warnings.warn("There are %d disconnected observations" % ni)
/Users/steffie/code/libpysal/libpysal/weights/weights.py:190: UserWarning: Island ids: 0, 1, 5, 24, 28, 81, 95, 102, 108, 110, 120, 123, 140, 170, 176, 224, 240, 248, 254, 255, 256, 257, 262, 277, 292, 295, 304, 322, 358, 375
warnings.warn("Island ids: %s" % ', '.join(str(island) for island in self.islands))```

This unwanted error can now be assessed using `splot.libpysal.plot_spatial_weights` functionality:

```plot_spatial_weights(weights, gdf)
plt.show()
```

This visualisation depicts the spatial weights network, a network of connections of the centroid of each polygon to the centroid of its neighbour. As we can see, there are many polygons in the south and west of this map, that are not connected to its neighbors. We can use `libpysal.weights.util.nonplanar_neighbors` to correct this error and visualise the result with `splot.libpysal.plot_spatial_weights`.

```wnp = libpysal.weights.util.nonplanar_neighbors(weights, gdf)

plot_spatial_weights(wnp, gdf)
plt.show()

```

As we can see, all erroneous islands are now stored as neighbors in our new weights object, depicted by the new joins displayed in orange. This example and more ca be tested by users via `splot`‘s extended documentation in jupyter notebooks.

### From mid sprint to full sprint

Additionally, the `splot` dev team has started to reach out to the `geopandas` and `Yellowbrick` dev teams in order to share knowledge and collaborate. `splot` functionality will depend on a data input as `geopandas` dataframe in future. Therefore we would like to start a collaboration on eventually coordinated release dates and potential joint visualisation projects, which will be discussed this week. Results and ideas will be collected in this issue.

`Yellowbrick` extends the Scikit-Learn API with diagnostic visualisations to steer machine learning processes. Its mission to extend an existing API by offering visualisations that can steer a data analysis process is very close to `splot`. Since `Yellowbrick` is already an established and popular package and seems to have had similar decisions to make, we decided to contact its dev team and are looking forward to a conversation in two weeks.

See you all at SciPy 2018!

## Lessons learned: implementing `splot`

In my last blog post I explained the new API structure the `splot` team decided on. In my last two weeks at GSoC I have been working on the implementation of this API for the `giddy` and `esda` sub-packages. This functionality will be available in the release of `splot` which will likely be at the end of this GSoC.

### This is an example how the new API can be used in future:

Imports

```import esda
import matplotlib.pyplot as plt
import libpysal.api as lp
from libpysal import examples
import geopandas as gpd```

Data preparation and statistical analysis

```link = examples.get_path('columbus.shp')
y = gdf['HOVAL'].values
w = lp.Queen.from_dataframe(gdf)
w.transform = 'r'
mloc = esda.moran.Moran_Local(y, w)```

Plotting methods

```mloc.plot(gdf, 'HOVAL')
mloc.plot(gdf, 'HOVAL', p=0.05, region_column='POLYID', mask=['1', '2', '3'], quadrant=1)
plt.show()```

A new addition to the existing plotting functionality is the use of `**kwargs` as a function parameter. This allows users to customise any `splot` plot or any pysal plotting method leveraging `splot` by passing in common key words which the underlying Matplotlib function accepts (eg. `color`, `c`, `marker size`, `linewidth`, etc.). By providing both, sensible defaults and the possibility to change the style of the visualisation according to the users needs, `splot` allows both for quick exploratory use and customisation for publication.

### Lessons Learned: Writing Library Code

During the first half of GSoC I dove into the world of open source code and spatial statistics head first. I was not aware of the fact that writing code for a library included much more than knowing how to write code and how to use git. In this section I would like to share my experiences and tips for writing open source library code with you:

#### My workflow creating new functionality:

1. It starts with the fun: write some code!
2. Create a function from the code if that has not already happened. Usually I iterate on a function until I am happy about the visual output. I therefore use `.py` files for my source code and Jupyter Notebooks to check the generated visualisations. Jupyter Lab for example provides a fantastic tool to work on both at once.
3. After drafting the main functionality, I am thinking about the elements that should be made configurable for a user, such as color, design elements, masking options, and implement my ideas.
4. When the functionality is thought out and ready to be used, I write a doc-string containing parameters, returns and examples. This will later be used by Sphinx to create the complete documentation for the package.
5. Now, I write a unit test that exercises the function and ensures that it keeps working when functionality is added or changed in future. These unit tests are run on Travis ci whenever someone makes changes to `splot` in a pull request.
6. Now my functionality is ready to meet `splot`. I make a pull request and check that TravisCI gives me a green light.

#### Tips to optimise this workflow which saved me a lot of time and helped debug my library code:

• While working in a Jupyter notebook or Jupyter lab using code from a Python file, Jupyter does not pick up any changes once it imported a function. This slows down the process of iteratively changing a function and checking the resulting visualisation output.  Using `reload` allowed me to make edits in the Python file and quickly checking the resulting changes in Jupyter without restarting the entire kernel.
```from importlib import reload
• Running `pyflakes` on the project repository in the terminal help me to clean up my code when it got a little messy. `pyflakes` picks up on loads of common errors like missing imports, unused imports, unused variables, etc allowing for a quick clean-up before making a pull request.
• Similarly, running `pep8` makes sure my code and documentation conforms to the Python style guide
• Applying `nosetests .` in my local project folder, runs all the unit tests in the same way as tests are run on Travis CI. This provides an additional check if changes to the code are correct, before I commit or make a pull request.
•  Failing tests on TravisCI can have may reasons. Running nosetests helps to check if my code is the problem or if for example the `.travis.yml` file needs to be updated (e.g. missing imports for new functionality). This file basically tells TravisCI what packages to install and how to run the test. When TravisCI is still failing, check the log which shows you the details why Travis testing failed. If it gets really tricky you can recreate the TravisCI testing step by step by executing different steps in the `.yml` file locally which will give you an indication when things start to go wrong.

## Designing the splot API

The last two weeks of GSoC with PySAL and `splot` were all about designing the splot package structure and creating a common API. The decisions made during this time provide a preliminary blueprint for all visualisations and functionality to come.

In this blogpost, I will provide a summary of the decisions that were made in the process. If you are interested in having a closer look, the discussions between developers, the community and student are openly accessible in our GSoC 2018 project on the splot repository.  A collection of the functionality that will be supported in future can be found here and the decisions how the package structure and API will look like will be made here.

### The splot package structure and API (as it currently stands):

In future `splot`‘s functionality can be accessed in two ways.

First of all, basic `splot` visualizations are exposed as `.plot`methods on PySAL objects. For example:

```from giddy.directional import Rose

...(Data preparation)
rose = Rose(Y, w)
rose.plot_heatmap()
plt.show()
```

Furthermore, all basic visualisations and more (to be defined) can be called with sub-package namespaces. For example:

```from splot.giddy import dynamic_lisa_composite
from splot.esda import ...
```

Lastly, the majority of all PySAL developers prefers Matplotlib as the default backend. I therefore decided to focus on implementing most visualisations in a Matplotlib version first before continuing to develop the Bokeh backend.

### Why we decided to prioritise matplotlib over bokeh

I found out over the last weeks, that it took much more of my time to implement a visualisation in bokeh rather than in matplotlib. I can think of four main reasons for this:

1. I am more familiar with Matplotlib and naturally quicker in creating visualisations with this backend.
2. Matplotlib seems to have better documentation and more examples on stackoverflow and many other places which speeds up the implementation of specialised visualisations.
3. I found a couple of things that were harder to implement in Bokeh, seemingly because of missing features and design choices.
4. There are more packages that already build on Matplotlib, like Geopandas `.plot` functionality or Seaborn that can be leveraged.

When visualising geographical data in maps, it is for example important to autoscale plots to keep their aspect ratio when the plotted figure changes size or format. This helps avoid distortion in spatially explicit representations.

To achieve this in matplotlib one can simply add the line:

`ax.set_aspect('equal')`

In Bokeh however that did not seem to be implemented yet, so I had to create this utility function:

```def calc_data_aspect(plot_height, plot_width, bounds):
# Deal with data ranges in Bokeh:
# make a meter in x and y the same in pixel lengths
aspect_box = plot_height / plot_width # 2 / 1 = 2
xmin, ymin, xmax, ymax = bounds
x_range = xmax - xmin # 1 = 1 - 0
y_range = ymax - ymin # 3 = 3 - 0
aspect_data = y_range / x_range # 3 / 1 = 3

if aspect_data > aspect_box:
# we need to increase x_range,
# such that aspect_data becomes equal to aspect_box
halfrange = 0.5 * x_range * (aspect_data/aspect_box-1)
# 0.5 * 1 * (3 / 2 - 1) = 0.25
xmin -= halfrange # 0 - 0.25 = -0.25
xmax += halfrange # 1 + 0.25 = 1.25
else:
# we need to increase y_range
halfrange = 0.5 * y_range * (aspect_box/aspect_data-1)
ymin -= halfrange
ymax += halfrange

# Add a bit of margin to both x and y
margin = 0.03
xmin -= (xmax - xmin) / 2 * margin
xmax += (xmax - xmin) / 2 * margin
ymin -= (ymax - ymin) / 2 * margin
ymax += (ymax - ymin) / 2 * margin
return xmin, xmax, ymin, ymax
```

It was also easier for me to use  Interact Jupyter notebook widgets with Matplotlib rather than Bokeh. In Matplotlib Interact simply regenerates the whole figure and was easy to implement:

```interact(_dynamic_lisa_widget_update,
timex=coldict, timey=coldict, rose=fixed(rose),
gdf=fixed(gdf), p=fixed(p), figsize=fixed(figsize)
)
```

With Bokeh as a backend one needs to update the datasource rather than the figure. This is much faster (the whole figure does not have to be re-drawn), but it is harder to implement because the datasource is hidden inside the previous plotting functions and therefore not directly accessible by Interact.

In conclusion, Bokeh generally is a great tool to make one off customised visualisations but generally seems to not fully support implementing library functions easily yet.

Another reason is that using Bokeh in Jupyter Lab required to load a Jupyter Lab extension which for some reason did not build for me. Nevertheless, Bokeh is still a relatively young package with great potential and I will definitely explore it and other more interactive visualisation libraries later in GSoC.

### Example of API usage

Now that the preliminary API design is chosen, I started to implement functions according to this pattern. This pull request contains five first functions and their documentation for `giddy.directional`. This is an example of the end result for one of the five:

## Developing splot – Code Updates

During the coding period of GSoC, you can follow me:

## from GSoC import splot

This blog post will introduce you to the `splot` package, how it is designed and what exactly I will be working on during the Google Summer of Code

#### What is splot?

The goal of the `splot` package is to meet the growing demand for a simple to use, lightweight interface that connects PySAL to different popular visualization toolkits like `bokeh`, `matplotlib` or `folium`. The `splot` package will ultimately provide users with both, static plots ready for publication and interactive visualizations that allow for quick iteration over ideas and data exploration. Please visit the `viz` module website for more detailed information and first examples for a possible design of such a package.

#### Design and components of splot

The `splot` package is ultimately structured into three levels. The highest level directly provides visualization functions for end users. Two lower layers are setting the basis for easy visualization by first, converting PySAL geometries (polygon, line, shape) into Matplotlib geometries and second, allowing for subsetting (e.g. plot only part of a .shp), aligning (e.g. same axes for different layers) and transforming (e.g. classify values to colors) graphical objects. So far the existing Moran plot provides a great example of how such functionality could look like.

#### GSoC project

Initial visualizations like LISA and choropleth maps were stated to be developed in the splot package but many functions remain to be coded. Besides refining existing plots, common views indicate that the Matplotlib interface needs to be extended by new maps (Join Count BW, regression maps), scatter plots (pairwise regression plots, …) and many more visualizations.

Next to providing these missing functional static plots the GSoC project will leverage more recent visualization technologies of the constantly evolving visualization space. This geovisualization project provides the scope to incorporate interactive visualizations developed in Jupyter within the `splot` package. It allows for exploration of potential new interfaces for alternative packages like Bokeh (plots with interactivity such as tooltips and zooming) and Folium (for plotting on top of web-sourced base layers, e.g. OpenStreetMap).

In the first phase of this project, we will therefore create different visualizations in both a static version with Matplotlib and an interactive version with Bokeh. Secondly, we will create a common API for easy access to both versions. After adding documentation we will be able to provide a complete and user friendly package. Finally, we will explore how alternative visualization packages, like Vega, could be integrated into the `splot` package in future.

Additionally, we will refactor the package to ensure all functionality and documentation can be accessed in the `splot` namespace and work towards its inclusion into the PySAL user guide.