Articles on demvessias's Blog

Google Summer of Code Final Work Product

messias.physics@gmail.com (demvessias) — Mon, 23 Aug 2021 19:45:51 +0000

Google Summer of Code 2021 Final Work Product

Name: Bruno Messias
Organisation: Python Software Foundation
Sub-Organisation: FURY
Project: A system for collaborative visualization of large network layouts using FURY

Abstract

We have changed some points of my project in the first meeting. Specifically, we focused the efforts into developing a streaming system using the WebRTC protocol that could be used in more generic scenarios than just the network visualization. In addition to that, we have opted to develop the network visualization for fury as a separated repository and package available here. The name Helios was selected for this new network visualization system based on the Fury rendering pipeline.

Proposed Objectives

Create a streaming system (stadia-like) for FURY
- Should work in a low-bandwidth scenario
- Should allow user interactions and collaboration across the Internet using a web-browser
Helios Network System objectives:
- Implement the Force-Directed Algorithm with examples
- Implement the ForceAtlas2 algorithm using cugraph with examples
- Implement Minimum-Distortion Embeddings algorithm (PyMDE) and examples
- Non-blocking network algorithms computation avoiding the GIL using the Shared Memory approach
- Create the documentation and the actions for the CI
Stretch Goals:
- Create an actor in FURY to draw text efficiently using shaders
- Add support to draw millions of nodes using FURY
- Add support to control the opengl state on FURY

Objectives Completed

Create a streaming system (stadia-like) for FURY

There are several reasons to have a streaming system for data visualization. Because I am doing my Ph.D. in a developing country I always need to think of the less expensive solutions to use the computational resources available. For example, with the GPU’s prices increasing, it is necessary to share the a single machine with GPU with other users at different locations.

To construct the streaming system for my project we have opted to follow three main properties and behaviors:
1. avoid blocking the code execution in the main thread (where the vtk/fury instance resides)
2. work inside of a low bandwidth environment
3. make it easy and cheap to share the rendering result. For example, using the free version of ngrok
To achieve the first property we need to circumvent the GIL and allow python code to execute in parallel. Using the threading module alone is not good enough to attain real pralellism as Python calls in the same process can not execute concurrently. In addition to that, to achieve better organization it is desirable to define the server system as an uncoupled module from the rendering pipeline. Therefore, I have chosen to employ the multiprocessing approach for that. The second and third property can be only achieved choosing a suitable protocol for transfering the rendered results to the client. We have opted to implement two streaming protocols: the MJPEG and the WebRTC. The latter is more suitable for low-bandwidth scenarios [1].

The image below shows a simple representation of the streaming system.

The video below shows how our streaming system works smottly and can be easily integrated inside of a Jupyter notebook.

Video: WebRTC Streaming + Ngrok

Video: WebRTC Streaming + Jupyter

Pull Requests: * https://github.com/fury-gl/fury/pull/480

2D and 3D marker actor

This feature gave FURY the ability to efficiently draw millions of markers and impostor 3D spheres. This feature was essential for the development of Helios. This feature work with signed distance fields (SDFs) you can get more information about how SDFs works here [4] .

The image bellow shows 1 million of markers rendered using an Intel HD graphics 3000.

Fine-Tunning the OpenGl State

Sometimes users may need to have finer control on how OpenGL will render the actors. This can be useful when they need to create specialized visualization effects or to improve the performance.

In this PR I have worked in a feature that allows FURY to control the OpenGL context created by VTK

Pull Request:
- https://github.com/fury-gl/fury/pull/432
Helios Network Visualization Lib: Network Layout Algorithms

Case 1: Suppose that you need to monitor a hashtag and build a social graph. You want to interact with the graph and at the same time get insights about the structure of the user interactions. To get those insights you can perform a node embedding using any kind of network layout algorithm, such as force-directed or minimum distortion embeddings.

Case 2: Suppose that you are modelling a network dynamic such as an epidemic spreading or a Kuramoto model. In some of those network dynamics a node can change the state and the edges related to the node must be deleted. For example, in an epidemic model a node can represent a person who died due to a disease. Consequently, the layout of the network must be recomputed to give better insights.

In the described cases, if we want a better (UX) and at the same time a more practical and insightful application of Helios, the employed layout algorithms should not block any kind of computation in the main thread.

In Helios we already have a lib written in C (with a python wrapper) which performs the force-directed layout algorithm using separated threads avoiding the GIL problem and consequently avoiding blocking the main thread. But what about the other open-source network layout libs available on the internet? Unfortunately, most of those libs have not been implemented like Helios force-directed methods and consequently, if we want to update the network layout the Python interpreter will block the computation and user interaction in your network visualization.

My solution for having PyMDE and CuGraph-ForceAtlas not blocking the main thread was to break the network layout method into two different types of processes: A and B and communicate both process using the Shared Memory approach. You can more information about this PR through my following posts [2], [3].

The image bellow show an example that I made and is available at https://github.com/fury-gl/helios/blob/main/docs/examples/viz_mde.py

Pull Requests:

MDE Layout: https://github.com/fury-gl/helios/pull/6
CuGraph ForceAtlas2 https://github.com/fury-gl/helios/pull/13
Force-Directed and MDE improvements https://github.com/fury-gl/helios/pull/14
Helios Network Visualization Lib: Visual Aspects

I’ve made several stuffs to give Helios a better visual aspects. One of them was to give a smooth real-time network layout animations. Because the layout computations happens into a different process that the process responsible to render the network was necessary to record the positions and communicate the state of layout between both process.

The GIF bellow shows how the network layout through IPC behaved before these modification

Bellow, you can see how after those modifications the visual aspect is better.

Pull Requests:

OpenGL SuperActors: https://github.com/fury-gl/helios/pull/1
Fixed the flickering effect https://github.com/fury-gl/helios/pull/10
Improvements in the network node visual aspects https://github.com/fury-gl/helios/pull/15
Smooth animations when using IPC layouts https://github.com/fury-gl/helios/pull/17
Helios Network Visualization Lib: CI and Documentation

Because Helios was a project that begins in my GSoC project It was necessary to create the documentation, hosting and more. Now we have a online documentation available at https://heliosnetwork.io/ altough the documentation still need some improvements.

Below is presented the Helios Logo which was developed by my mentor Filipi Nascimento.

Pull Requests:

CI and pytests: https://github.com/fury-gl/helios/pull/5, https://github.com/fury-gl/helios/pull/20
Helios Logo, Sphinx Gallery and API documentation https://github.com/fury-gl/helios/pull/18
Documentation improvements: https://github.com/fury-gl/helios/pull/8
Objectives in Progress
Draw texts on FURY and Helios

This two PRs allows FURY and Helios to draw millions of characters in VTK windows instance with low computational resources consumptions. I still working on that, finishing the SDF font rendering which the theory behinds was developed here [5].

Pull Requests:
- https://github.com/fury-gl/helios/pull/24
- https://github.com/fury-gl/fury/pull/489
GSoC weekly Blogs

Weekly blogs were added to the FURY Website.

Pull Requests:
- First Evaluation: https://github.com/fury-gl/fury/pull/476
- Second Evaluation: TBD

Timeline

Date	Description	Blog Link
Week 1 (08-06-2021)	Welcome to my weekly Blogs!	Weekly Check-in #1
Week 2 (14-06-2021)	Post #1: A Stadia-like system for data visualization	Weekly Check-in #2
Week 3 (21-06-2021)	2d and 3d fake impostors marker; fine-tunning open-gl state; Shared Memory support for the streaming system; first-version of helios: the network visualization lib for helios	Weekly Check-in #3
Week 4 (28-06-2020)	Post #2: SOLID, monkey patching a python issue and network layouts through WebRTC	Weekly Check-in #4
Week 5 (05-07-2021)	Code refactoring; 2d network layouts for Helios; Implemented the Minimum distortion embedding algorithm using the IPC approach	Weekly Check-in #5
Week 6 (12-07-2020)	Post #3: Network layout algorithms using IPC	Weekly Check-in #6
Week 7 (19-07-2020)	Helios IPC network layout algorithms support for MacOs; Smooth animations for IPC layouts; ForceAtlas2 network layout using cugraph/cuda	eekly Check-in #7
Week 8 (26-07-2020)	Helios CI, Helios documentation	Weekly Check-in #8
Week 9 (02-08-2020)	Helios documentation; improved the examples and documentation of the WebRTC streaming system and made some improvements in the compatibility removing some dependencies	Weekly Check-in #9
Week 10 (09-08-2020)	Helios documentation improvements; found and fixed a bug in fury w.r.t. the time management system; improved the memory management system for the network layout algorithms using IPC	Weekly Check-in #10
Week 11 (16-08-2020)	Created a PR that allows FURY to draw hundred of thousands of characters without any expensive GPU; fixed the flickering effect on the streaming system; helios node labels feature; finalizing remaining PRs	Weekly Check-in #11

Detailed weekly tasks, progress and work done can be found here.

References

[1] ( Python GSoC - Post #1 - A Stadia-like system for data visualization - demvessias s Blog, n.d.; https://blogs.python-gsoc.org/en/demvessiass-blog/post-1-a-stadia-like-system-for-data-visualization/

[2] Python GSoC - Post #2: SOLID, monkey patching a python issue and network layouts through WebRTC - demvessias s Blog, n.d.; https://blogs.python-gsoc.org/en/demvessiass-blog/post-2-solid-monkey-patching-a-python-issue-and-network-layouts-through-webrtc/

[3] Python GSoC - Post #3: Network layout algorithms using IPC - demvessias s Blog, n.d.)https://blogs.python-gsoc.org/en/demvessiass-blog/post-3-network-layout-algorithms-using-ipc/

[4] Rougier, N.P., 2018. An open access book on Python, OpenGL and Scientific Visualization [WWW Document]. An open access book on Python, OpenGL and Scientific Visualization. URL https://github.com/rougier/python-opengl (accessed 8.21.21).

[5] Green, C., 2007. Improved alpha-tested magnification for vector textures and special effects, in: ACM SIGGRAPH 2007 Courses on - SIGGRAPH ’07. Presented at the ACM SIGGRAPH 2007 courses, ACM Press, San Diego, California, p. 9. https://doi.org/10.1145/1281500.1281665

Weekly Check-in #11

messias.physics@gmail.com (demvessias) — Mon, 16 Aug 2021 23:26:34 +0000

Hi everyone! My name is Bruno Messias. Currently I'm a Ph.D student at USP/Brazil. This summer I'll develop new tools and features for FURY-GL Specifically, I'll focus on developing a system for collaborative visualization of large network layouts using FURY and VTK.

What did I do this week?

FURY

PR fury-gl/fury#489:
I've created the PR that will allow FURY to draw hundreds thousands of labels using texture maps. By default, this PR give to FURY three pre-built texture maps using different fonts. However, is quite easy to create new fonts to be used in a visualization. It's was quite hard to develop the shader code and find the correct positions of the texture maps to be used in the shader. Because we used the freetype-py to generate the texture and packing the glyps. However, the lib has some examples with bugs. But fortunelly, now everthing is woking on FURY. I've also created two different examples to show how this PR works.
- The first example, viz_huge_amount_of_labels.py, shows that feature has a realy good performance. The user can draw hundreds of thounsands of characters in a regular computer.
- The second example, viz_billboad_labels.py, shows the different behaviors of the label actor. In addition, presents to the user how to create a new texture atlas font to be used across different visualizations.
PR fury-gl/fury#437:
- Fix: avoid multiple OpenGl context on windows using asyncio
  The streaming system must be generic, but opengl and vtk behaves in uniques ways in each Operating System. Thus, can be tricky to have the same behavior acrros different OS. One hard stuff that we founded is that was not possible to use my TimeIntervals objects (implemented with threading module) with vtk. The reason for this impossibility is because we can't use vtk in windows in different threads. But fortunely, moving from the threading (multithreading) to the asyncio approcach (concurrency) have fixed this issue and now the streaming system is ready to be used anywhere.
- Flickering
  Finally, I could found the cause of the flickering effect on the streaming system. This flickering was appearing only when the streaming was created using the Widget object. The cause seems to be a bug or a strange behavior from vtk. Calling
```
iren.MouseWheelForwardEvent()
```
  or
```
iren.MouseWheelBackwardEvent()
```
  inside of a thread without invoking the Start method from a vtk instance produces a memory corruption. Fortunately, I could fix this behavior and now the streaming system is working without this glitch effect.

FURY/Helios

PR fury-gl/helios#24 :
This uses the PRfury-gl/fury#489: to give the network label feature to helios. Is possible to draw node labels, update the colors, change the positions at runtime. In addition, when a network layout algorithm is running this will automatically update the node labels positions to follow the nodes across the screen.
PR fury-gl/helios#23: Merged. This PR granted compatibility between IPC Layouts and Windows. Besides that , now is quite easier to create new network layouts using inter process communication

Did I get stuck anywhere?

I did not get stuck this week.

What is coming up next?

I’ll discuss that with my mentors tomorrow.

Weekly Check-in #10

messias.physics@gmail.com (demvessias) — Mon, 09 Aug 2021 19:20:16 +0000

What did I do this week?

FURY/Helios

PR fury-gl/helios#22 : Helios Documentation Improvements.
PR fury-gl/helios#23: A PR that makes helios IPCLayout system compatible with Windows.

FURY

PR fury-gl/fury#484: I've found and fixed a bug in FURY time managment system
PR fury-gl/fury#437:
- Fixed the tests on Windows
- Improve the streaming memory managment system for IPC communication
I've developing a feature that will allows FURY to draw hundreds thousands of labels using texture maps and signed distance functions. Until now I've a sketch that at least is able to draw the labels using the markers billboards and bitmap fonts.
PR fury-gl/fury#432: minor improvements
PR #474 Helped to review this PR

Did I get stuck anywhere?

I did not get stuck this week.

What is coming up next?

I’ll discuss that with my mentors tomorrow.

Weekly Check-in #9

messias.physics@gmail.com (demvessias) — Mon, 02 Aug 2021 23:06:11 +0000

What did I do this week?

FURY/Helios

PR fury-gl/helios#22 : Helios Documentation Improvements. I’ve spent some time studying sphinx in order to discover how I could create a custom summary inside of a template module.

FURY

Added my GSoC blogs to the FURY blogs as requested by my mentors.
PR fury-gl/fury#43:
- Docstrings improvements
- Covered more tests
- Covered tests using optional dependencies.
- Aiortc now it’s not a mandatory dependency
- improvements in memory management
PR #432 Fixed some typos, improved the tests and docstrings
Helped to review and made some suggestions to the PR #474 made by @mehabhalodiya.

Did I get stuck anywhere?

I did not get stuck this week.

What is coming up next?

I’ll discuss that with my mentors tomorrow.

Weekly Check-In #8

messias.physics@gmail.com (demvessias) — Mon, 26 Jul 2021 14:10:44 +0000

What did I do this week?

PR fury-gl/helios#18 (merged): Helios Documentation
I’ve been working in the Helios documentation. Now it’s available online at https://fury-gl.github.io/helios-website

PR fury-gl/helios#17 (merged): Helios CI for tests and code coverage

Did I get stuck anywhere?

I did not get stuck this week.

What is coming up next?

I’ll discuss that with my mentors tomorrow.

Weekly Check-In #7

messias.physics@gmail.com (demvessias) — Mon, 19 Jul 2021 13:18:00 +0000

Hi everyone! My name is Bruno Messias currently I'm a Ph.D student at USP/Brazil. In this summer I'll develop new tools and features for FURY-GL Specifically, I'll focus into developing a system for collaborative visualization of large network layouts using FURY and VTK.

What did I do this week?

PR fury-gl/helios#16 (merged): Helios IPC network layout support for MacOs
PR fury-gl/helios#17 (merged): Smooth animations for IPC network layout algorithms
Before this commit was not possible to record the positions to have a smooth animations with IPCLayout approach. See the animation bellow

After this PR now it's possible to tell Helios to store the evolution of the network positions using the record_positions parameter. This parameter should be passed on the start method. Notice in the image bellow how this gives to us a better visualization
PR fury-gl/helios#13 (merged) Merged the forceatlas2 cugraph layout algorithm

Did I get stuck anywhere?

I did not get stuck this week.

What is coming up next?

Probably, I'll work more on Helios. Specifically I want to improve the memory management system. It seems that some shared memory resources are not been released when using the IPCLayout approach.

Post #3: Network layout algorithms using IPC

messias.physics@gmail.com (demvessias) — Mon, 12 Jul 2021 15:11:52 +0000

Hi all. In the past weeks, I’ve been focusing on developing Helios; the network visualization library for FURY. I improved the visual aspects of the network rendering as well as implemented the most relevant network layout methods.

In this post I will discuss the most challenging task that I faced to implement those new network layout methods and how I solved it.

The problem: network layout algorithm implementations with a blocking behavior

Case 1: Suppose that you need to monitor a hashtag and build a social graph. You want to interact with the graph and at the same time get insights about the structure of the user interactions. To get those insights you can perform a node embedding using any kind of network layout algorithm, such as force-directed or minimum distortion embeddings.

Case 2: Suppose that you are modelling a network dynamic such as an epidemic spreading or a Kuramoto model. In some of those network dynamics a node can change the state and the edges related to the node must be deleted. For example, in an epidemic model a node can represent a person who died due to a disease. Consequently, the layout of the network must be recomputed to give better insights.

In described cases if we want a better (UX) and at the same time a more practical and insightful application of Helios layouts algorithms shouldn’t block any kind of computation in the main thread.

In Helios we already have a lib written in C (with a python wrapper) which performs the force-directed layout algorithm using separated threads avoiding the GIL problem and consequently avoiding the blocking. But and the other open-source network layout libs available on the internet? Unfortunately, most of those libs have not been implemented like Helios force-directed methods and consequently, if we want to update the network layout the python interpreter will block the computation and user interaction in your network visualization. How to solve this problem?

Why is using the python threading is not a good solution?

One solution to remove the blocking behavior of the network layout libs like PyMDE is to use the threading module from python. However, remember the GIL problem: only one thread can execute python code at once. Therefore, this solution will be unfeasible for networks with more than some hundreds of nodes or even less! Ok, then how to solve it well?

IPC using python

As I said in my previous posts I’ve created a streaming system for data visualization for FURY using webrtc. The streaming system is already working and an important piece in this system was implemented using the python SharedMemory from multiprocessing. We can get the same ideas from the streaming system to remove the blocking behavior of the network layout libs.

My solution to have PyMDE and CuGraph-ForceAtlas without blocking was to break the network layout method into two different types of processes: A and B. The list below describes the most important behaviors and responsibilities for each process

Process A:

Where the visualization (NetworkDraw) will happen
Create the shared memory resources: edges, weights, positions, info..
Check if the process B has updated the shared memory resource which stores the positions using the timestamp stored in the info_buffer
Update the positions inside of NetworkDraw instance

Process B:

Read the network information stored in the shared memory resources: edges , weights, positions
Execute the network layout algorithm
Update the positions values inside of the shared memory resource
Update the timestamp inside of the shared memory resource

I used the timestamp information to avoid unnecessary updates in the FURY/VTK window instance, which can consume a lot of computational resources.

How have I implemented the code for A and B?

Because we need to deal with a lot of different data and share them between different processes I’ve created a set of tools to deal with that, take a look for example in the ShmManagerMultiArrays Object , which makes the memory management less painful.

I'm breaking the layout method into two different processes. Thus I’ve created two abstract objects to deal with any kind of network layout algorithm which must be performed using inter-process-communication (IPC). Those objects are: NetworkLayoutIPCServerCalc ; used by processes of type B and NetworkLayoutIPCRender ; which should be used by processes of type A.

I’ll not bore you with the details of the implementation. But let’s take a look into some important points. As I’ve said saving the timestamp after each step of the network layout algorithm. Take a look into the method _check_and_sync from NetworkLayoutIPCRender here. Notice that the update happens only if the stored timestamp has been changed. Also, look at this line helios/layouts/mde.py#L180, the IPC-PyMDE implementation This line writes a value 1 into the second element of the info_buffer. This value is used to inform the process A that everything worked well. I used that info for example in the tests for the network layout method, see the link helios/tests/test_mde_layouts.py#L43

Results

Until now Helios has three network layout methods implemented: Force Directed , Minimum Distortion Embeddings and Force Atlas 2. Here docs/examples/viz_helios_mde.ipynb you can get a jupyter notebook that I’ve a created showing how to use MDE with IPC in Helios.

In the animation below we can see the result of the Helios-MDE application into a network with a set of anchored nodes.

Next steps

I’ll probably focus on the Helios network visualization system. Improving the documentation and testing the ForceAtlas2 in a computer with cuda installed. See the list of opened issues

Summary of most important pull-requests:

IPC tools for network layout methods (helios issue #7) fury-gl/helios/pull/6
New network layout methods for fury (helios issue #7) fury-gl/helios/pull/9 fury-gl/helios/pull/14 fury-gl/helios/pull/13
Improved the visual aspects and configurations of the network rendering(helios issue #12) https://github.com/devmessias/helios/tree/fury_network_actors_improvements
Tests, examples and documentation for Helios (helios issues #3 and #4) fury-gl/helios/pull/5
Reduced the flickering effect on the FURY/Helios streaming system fury-gl/helios/pull/10 fury-gl/fury/pull/437/commits/a94e22dbc2854ec87b8c934f6cabdf48931dc279

Weekly Check-In #5

messias.physics@gmail.com (demvessias) — Mon, 05 Jul 2021 21:51:45 +0000

What did you do this week?

fury-gl/fury PR#437: WebRTC streaming system for FURY

Before the 8c670c2 commit, for some versions of MacOs the streaming system was falling in a silent bug. I’ve spent a lot of time researching to find a cause for this. Fortunately, I could found the cause and the solution. This troublesome MacOs was falling in a silent bug because the SharedMemory Object was creating a memory resource with at least 4086 bytes indepedent if I've requested less than that. If we look into the MultiDimensionalBuffer Object (stream/tools.py) before the 8c670c2 commit we can see that Object has max_size parameter which needs to be updated if the SharedMemory was created with a "wrong" size.

fury-gl/helios PR 1: Network Layout and SuperActors

In the past week I've made a lot of improvements in this PR, from performance improvements to visual effects. Bellow are the list of the tasks related with this PR:

- Code refactoring.
- Visual improvements: Using the UniformTools from my pull request #424 now is possible to control all the visual characteristics at runtime.
- 2D Layout: Meanwhile 3d network representations are very usefully for exploring a dataset is hard to convice a group of network scientists to use a visualization system which dosen't allow 2d representations. Because of that I started to coding the 2d behavior in the network visualization system.
- Minimum Distortion Embeddings examples: I've created some examples which shows how integrate pymde (Python Minimum Distortion Embeddings) with fury/helios. The image bellow shows the result of this integration: a "perfect" graph embedding

What is coming up next week?

I'll probably focus on the heliosPR#1. Specifically, writing tests and improving the minimum distortion embedding layout.

Did you get stuck anywhere?

I did not get stuck this week.

Post #2: SOLID, monkey patching a python issue and network layouts through WebRTC

messias.physics@gmail.com (demvessias) — Mon, 28 Jun 2021 13:28:18 +0000

Hi everyone! My name is Bruno Messias and I'm a PhD student working with graphs and networks. This summer I'll develop new tools and features for FURY-GL Specifically, I'll focus on developing a system for collaborative visualization of large network layouts using FURY and VTK.

These past two weeks I’ve spent most of my time in the Streaming System PR and the Network Layout PR In this post I’ll focus on the most relevant things I’ve made for those PRs

Streaming System

Pull request fury-gl/fury/pull/437

Code Refactoring

Abstract class and SOLID

The past weeks I've spent some time refactoring the code to see what I’ve done let’ s take a look into this fury/blob/b1e985.../fury/stream/client.py#L20, the FuryStreamClient Object before the refactoring.

The code is a mess. To see why this code is not good according to SOLID principles let’s just list all the responsibilities of FuryStreamClient

Creates a RawArray or SharedMemory to store the n-buffers
Creates a RawArray or SharedMemory to store the information about each buffer
Cleanup the shared memory resources if the SharedMemory was used
Write the vtk buffer into the shared memory resource
Creates the vtk callbacks to update the vtk-buffer

That’s a lot and those responsibilities are not even related to each other. How can we be more SOLID[1]? An obvious solution is to create a specific object to deal with the shared memory resources. But it's not good enough because we still have a poor generalization since this new object still needs to deal with different memory management systems: rawarray or shared memory (maybe sockets in the future). Fortunately, we can use the python Abstract Classes[2] to organize the code.

To use the ABC from python I first listed all the behaviors that should be mandatory in the new abstract class. If we are using SharedMemory or RawArrays we need first to create the memory resource in a proper way. Therefore, the GenericImageBufferManager must have a abstract method create_mem_resource. Now take a look into the ImageBufferManager inside of stream/server/server.py, sometimes it is necessary to load the memory resource in a proper way. Because of that, the GenericImageBufferManager needs to have a load_mem_resource abstract method. Finally, each type of ImageBufferManager should have a different cleanup method. The code below presents the sketch of the abstract class



from abc import ABC, abstractmethod

GenericImageBufferManager(ABC):
    def __init__(
            self, max_window_size=None, num_buffers=2, use_shared_mem=False):
	…
 	#...
    @abstractmethod
    def load_mem_resource(self):
        pass
    @abstractmethod
    def create_mem_resource(self):
        pass
    @abstractmethod
    def cleanup(self):
        pass

Now we can look for those behaviors inside of FuryStreamClient.py and ImageBufferManger.py that does not depend if we are using the SharedMemory or RawArrays. These behaviors should be methods inside of the new GenericImageBufferManager.

# code at: https://github.com/devmessias/fury/blob/440a39d427822096679ba384c7d1d9a362dab061/fury/stream/tools.py#L491

class GenericImageBufferManager(ABC):
    def __init__(
            self, max_window_size=None, num_buffers=2, use_shared_mem=False)
        self.max_window_size = max_window_size
        self.num_buffers = num_buffers
        self.info_buffer_size = num_buffers*2 + 2
        self._use_shared_mem = use_shared_mem
         # omitted code
    @property
    def next_buffer_index(self):
        index = int((self.info_buffer_repr[1]+1) % self.num_buffers)
        return index
    @property
    def buffer_index(self):
        index = int(self.info_buffer_repr[1])
        return index
    def write_into(self, w, h, np_arr):
        buffer_size = buffer_size = int(h*w)
        next_buffer_index = self.next_buffer_index
         # omitted code

    def get_current_frame(self):
        if not self._use_shared_mem:
        # omitted code
        return self.width, self.height, self.image_buffer_repr

    def get_jpeg(self):
        width, height, image = self.get_current_frame()
        if self._use_shared_mem:
        # omitted code
        return image_encoded.tobytes()

    async def async_get_jpeg(self, ms=33):
       # omitted code
    @abstractmethod
    def load_mem_resource(self):
        pass

    @abstractmethod
    def create_mem_resource(self):
        pass

    @abstractmethod
    def cleanup(self):
        Pass

With the GenericImageBufferManager the RawArrayImageBufferManager and SharedMemImageBufferManager is now implemented with less duplication of code (DRY principle). This makes the code more readable and easier to find bugs. In addition, later we can implement other memory management systems in the streaming system without modifying the behavior of FuryStreamClient or the code inside of server.py.

I’ve also applied the same SOLID principles to improve the CircularQueue object. Although the CircularQueue and FuryStreamInteraction was not violating the S from SOLID the head-tail buffer from the CircularQueue must have a way to lock the write/read if the memory resource is busy. Meanwhile the multiprocessing.Arrays already has a context which allows lock (.get_lock()) SharedMemory dosen’t[2]. The use of abstract class allowed me to deal with those peculiarities. commit 358402e

Using namedtuples to grant immutability and to avoid silent bugs

The circular queue and the user interaction are implemented in the streaming system using numbers to identify the type of event (mouse click, mouse weel, ...) and where to store the specific values associated with the event , for example if the ctrl key is pressed or not. Therefore, those numbers appear in different files and locations: tests/test_stream.py, stream/client.py, steam/server/app_async.py. This can be problematic because a typo can create a silent bug. One possibility to mitigate this is to use a python dictionary to store the constant values, for example


EVENT_IDS = {
	“ mouse_move” : 2, “mouse_weel”: 1, ….
}

But this solution has another issue, anywhere in the code we can change the values of EVENT_IDS and this will produce a new silent bug. To avoid this I chose to use namedtuples to create an immutable object which holds all the constant values associated with the user interactions. stream/constants.py

The namedtuple has several advantages when compared to dictionaries for this specific situation. In addition, it has a better performance. A good tutorial about namedtuples it’s available here https://realpython.com/python-namedtuple/

Testing

My mentors asked me to write tests for this PR. Therefore, this past week I’ve implemented the most important tests for the streaming system: /fury/tests/test_stream.py

Most relevant bugs

As I discussed in my third week check-in there is an open issue related to SharedMemory in python. This"bug" happens in the streaming system through the following scenario


1-Process A creates a shared memory X
2-Process A creates a subprocess B using popen (shell=False)
3-Process B reads X
4-Process B closes X
5-Process A kills B
4-Process A closes  X
5-Process A unlink() the shared memory resource

In python, this scenario translates to


from multiprocessing import shared_memory as sh
import time
import subprocess
import sys

shm_a = sh.SharedMemory(create=True, size=10000)
command_string = f"from multiprocessing import shared_memory as sh;import time;shm_b = sh.SharedMemory('{shm_a.name}');shm_b.close();"
time.sleep(2)
p = subprocess.Popen(
    [sys.executable, '-c', command_string],
    stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=False)
p.wait()
print("\nSTDOUT")
print("=======\n")
print(p.stdout.read())
print("\nSTDERR")
print("=======\n")
print(p.stderr.read())
print("========\n")
time.sleep(2)
shm_a.close()
shm_a.unlink()

Fortunately, I could use a monkey-patching[3] solution to fix that; meanwhile we're waiting for the python-core team to fix the resource_tracker (38119) issue [4].

Network Layout (Helios-FURY)

Pull request fury-gl/helios/pull/1

Finally, the first version of FURY network layout is working as can you see in the video below

In addition, this already can be used with the streaming system allowing user interactions across the internet with WebRTC protocol.

One of the issues that I had to solve to achieve the result presented in the video above was to find a way to update the positions of the vtk objects without blocking the main thread and at the same time allowing the vtk events calls. My solution was to define an interval timer using the python threading module: /fury/stream/tools.py#L776, /fury/stream/client.py#L112 /fury/stream/client.py#L296

Refs:

[1] A. Souly,"5 Principles to write SOLID Code (examples in Python)," Medium, Apr. 26, 2021. https://towardsdatascience.com/5-principles-to-write-solid-code-examples-in-python-9062272e6bdc (accessed Jun. 28, 2021).
[2]"[Python-ideas] Re: How to prevent shared memory from being corrupted ?" https://www.mail-archive.com/python-ideas@python.org/msg22935.html (accessed Jun. 28, 2021).
[3]“Message 388287 - Python tracker." https://bugs.python.org/msg388287 (accessed Jun. 28, 2021).
[4]“bpo-38119: Fix shmem resource tracking by vinay0410 · Pull Request #21516 · python/cpython," GitHub. https://github.com/python/cpython/pull/21516 (accessed Jun. 28, 2021).

Weekly Check-In #3

messias.physics@gmail.com (demvessias) — Mon, 21 Jun 2021 11:28:09 +0000

Hi everyone! My name is Bruno Messias. In this summer I'll develop new tools and features for FURY-GL Specifically, I'll focus into developing a system for collaborative visualization of large network layouts using FURY and VTK.

What did you do this week?

PR fury-gl/fury#422 (merged): Integrated the 3d impostor spheres with the marker actor
PR fury-gl/fury#422 (merged): Fixed some issues with my maker PR which now it's merged on fury
PR fury-gl/fury#432 I've made some improvements in my PR which can be used to fine tuning the opengl state on VTK
PR fury-gl/fury#437 I've made several improvements in my streamer proposal for FURY. most of those improvements it's related with memory management. Using the SharedMemory from python 3.8 now it's possible to use the streamer direct on a jupyter without blocking
PR fury-gl/helios#1 First version of async network layout using force-directed.

Did I get stuck anywhere?

A python-core issue

I've spent some hours trying to discover this issue. But now it's solved through the commit devmessias/fury/commit/071dab85

The SharedMemory from python>=3.8 offers new a way to share memory resources between unrelated process. One of the advantages of using the SharedMemory instead of the RawArray from multiprocessing it’s that the SharedMemory allows to share memory blocks without those processes be related with a fork or spawm method. The SharedMemory behavior allowed to achieve our jupyter integration and simplifies the use of the streaming system. However, I saw a issue in the shared memory implementation.

Let’s see the following scenario:


1-Process A creates a shared memory X
2-Process A creates a subprocess B using popen (shell=False)
3-Process B reads X
4-Process B closes X
5-Process A kills B
4-Process A closes  X
5-Process A unlink() the shared memory resource X

This scenario should work well. unlink() X in it's the right way as discussed in the python official documentation. However, there is a open issue which a think it's related with the above scenario.

Fortunately, I could use a monkey-patching solution to fix that meanwhile we wait to the python-core team to fix the resource_tracker (38119) issue.

What is coming up next?

I'm planning to work in the fury-gl/fury#432 and fury-gl/helios#1.

Post #1 - A Stadia-like system for data visualization

messias.physics@gmail.com (demvessias) — Mon, 14 Jun 2021 18:21:03 +0000

Hi all! In this post I'll talk about the PR #437.

There are several reasons to have a streaming system for data visualization. Because I’m doing a PhD in a developing country I always need to think of the cheapest way to use the computational resources available. For example, with the GPU’s prices increasing, it’s necessary to share a machine with a GPU with different users in different locations. Therefore, to convince my Brazilian friends to use FURY I need to code thinking inside of the (a) low-budget scenario.

To construct the streaming system for my project I’m thinking about the following properties and behaviors:

I want to avoid blocking the code execution in the main thread (where the vtk/fury instance resides)
The streaming should work inside of a low bandwidth environment
II need an easy way to share the rendering result. For example, using the free version of ngrok

To achieve the property 1. we need to circumvent the GIL problem. Using the threading module alone it’s not good enough because we can’t use the python-threading for parallel CPU computation. In addition, to achieve a better organization it’s better to define the server system as an uncoupled module. Therefore, I believe that multiprocessing-lib in python will fit very well for our proposes.

For the streaming system to work smoothly in a low-bandwidth scenario we need to choose the protocol wisely. In the recent years the WebRTC protocol has been used in a myriad of applications like google hangouts and Google Stadia aiming low latency behavior. Therefore, I choose the webrtc as my first protocol to be available in the streaming system proposal.

To achieve the third property, we must be economical in adding requirements and dependencies.

Currently, the system has some issues, but it's already working. You can see some tutorials about how to use this streaming system here. After running one of these examples you can easily share the results and interact with other users. For example, using the ngrok For example, using the ngrok


  ./ngrok http 8000

How does it works?

The image bellow it's a simple representation of the streaming system.

As you can see, the streaming system is made up of different processes that share some memory blocks with each other. One of the hardest part of this PR was to code this sharing between different objects like VTK, numpy and the webserver. I'll discuss next some of technical issues that I had to learn/circunvent.

Sharing data between process

We want to avoid any kind of unnecessary duplication of data or expensive copy/write actions. We can achieve this economy of computational resources using the multiprocessing module from python.

multiprocessing RawArray

The RawArray from multiprocessing allows to share resources between different processes. However, there are some tricks to get a better performance when we are dealing with RawArray's. For example, take a look at my PR in a older stage. In this older stage my streaming system was working well. However, one of my mentors (Filipi Nascimento) saw a huge latency for high-resolutions examples. My first thought was that latency was caused by the GPU-CPU copy from the opengl context. However, I discovered that I've been using RawArray's wrong in my entire life!
See for example this line of code fury/stream/client.py#L101 The code bellow shows how I've been updating the raw arrays


raw_arr_buffer[:] = new_data

This works fine for small and medium sized arrays, but for large ones it takes a large amount of time, more than GPU-CPU copy. The explanation for this bad performance is available here : Demystifying sharedctypes performance. The solution which gives a stupendous performance improvement is quite simple. RawArrays implements the buffer protocol. Therefore, we just need to use the memoryview:


memview(arr_buffer)[:] = new_data

The memview is really good, but there it's a litle issue when we are dealing with uint8 RawArrays. The following code will cause an exception


memview(arr_buffer_uint8)[:] = new_data_uint8

There is a solution for uint8 rawarrays using just memview and cast methods. However, numpy comes to rescue and offers a simple and a more a generic solution. You just need to convert the rawarray to a np representation in the following way


arr_uint8_repr = np.ctypeslib.as_array(arr_buffer_uint8)
arr_uint8_repr[:] = new_data_uint8

You can navigate to my repository in this specific commit position and test the streaming examples to see how this little modification improves the performance.

Multiprocessing inside of different Operating Systems

Serge Koudoro, who is one of my mentors, has pointed out an issue of the streaming system running in MacOs. I don't know many things about MacOs, and as pointed out by Filipi the way that MacOs deals with multiprocessing is very different than the Linux approach. Although we solved the issue discovered by Serge, I need to be more carefully to assume that different operating systems will behave in the same way. If you want to know more,I recommend that you read this post Python: Forking vs Spawm. And it's also important to read the official documentation from python. It can save you a lot of time. Take a look what the official python documentation says about the multiprocessing method Take a look what the official python documentation says about the multiprocessing method

<small>Source: https://docs.python.org/3/library/multiprocessing.html</small>

Weekly Check-In #1

messias.physics@gmail.com (demvessias) — Tue, 08 Jun 2021 14:46:11 +0000

What did I do this week?

In my first meeting the mentors explained the rules and the code of conduct inside the FURY organization. We also made some modifications in the timeline and discussed the next steps of my project. I started coding during the community bonding period. The next paragraph shows my contributions in the past weeks

A FURY/VTK webrtc stream system proposal: to the second part of my GSoC project I need to have a efficiently and easy to use streaming system to send the graph visualizations across the Internet. In addition, I also need this to my Ph.D. Therefore, I’ve been working a lot in this PR. This PR it’s also help me to achieve the first part of my project. Because I don’t have a computer with good specs in my house and I need to access a external computer to test the examples for large graphs.
Minor improvements into the shader markers PR and fine tunning open-gl state PR.

Did I get stuck anywhere?

I’ve been stuck into a performance issue (copying the opengl framebuffer to a python rawarray) which caused a lot of lag in the webrtc streamer. Fortunately, I discovered that I’ve been using rawarrays in the wrong way. My commit solved this performance issue.

What is coming up next?

In this week I'll focus on finish the #432 and #422 pull-requests.

Articles on demvessias's Blog

Google Summer of Code Final Work Product

Google Summer of Code 2021 Final Work Product

Abstract

Proposed Objectives

Objectives Completed

Create a streaming system (stadia-like) for FURY

2D and 3D marker actor

Fine-Tunning the OpenGl State

Helios Network Visualization Lib: Network Layout Algorithms

Helios Network Visualization Lib: Visual Aspects

Helios Network Visualization Lib: CI and Documentation

Objectives in Progress

Draw texts on FURY and Helios

GSoC weekly Blogs

Timeline

References

Weekly Check-in #11

What did I do this week?

FURY

Fix: avoid multiple OpenGl context on windows using asyncio

Flickering

FURY/Helios

Did I get stuck anywhere?

What is coming up next?

Weekly Check-in #10

What did I do this week?

FURY/Helios

FURY

Did I get stuck anywhere?

What is coming up next?

Weekly Check-in #9

What did I do this week?

FURY/Helios

FURY

Did I get stuck anywhere?

What is coming up next?

Weekly Check-In #8

What did I do this week?

Did I get stuck anywhere?

What is coming up next?

Weekly Check-In #7

What did I do this week?

Did I get stuck anywhere?

What is coming up next?

Post #3: Network layout algorithms using IPC

The problem: network layout algorithm implementations with a blocking behavior

Why is using the python threading is not a good solution?

IPC using python

How have I implemented the code for A and B?

Results

Next steps

Summary of most important pull-requests:

Weekly Check-In #5

What did you do this week?

fury-gl/fury PR#437: WebRTC streaming system for FURY

fury-gl/helios PR 1: Network Layout and SuperActors

What is coming up next week?

Did you get stuck anywhere?

Post #2: SOLID, monkey patching a python issue and network layouts through WebRTC

Streaming System

Code Refactoring

Abstract class and SOLID

Using namedtuples to grant immutability and to avoid silent bugs

Testing

Most relevant bugs

Network Layout (Helios-FURY)

Refs:

Weekly Check-In #3

What did you do this week?

Did I get stuck anywhere?

A python-core issue

What is coming up next?

Post #1 - A Stadia-like system for data visualization

How does it works?

Sharing data between process

multiprocessing RawArray

Multiprocessing inside of different Operating Systems

Weekly Check-In #1

What did I do this week?