demvessias's Blog

Weekly Check-In #3

demvessias
Published: 06/21/2021

Hi everyone! My name is Bruno Messias. In this summer I'll develop new tools and features for FURY-GL Specifically, I'll focus into developing a system for collaborative visualization of large network layouts using FURY and VTK.

What did you do this week?

  • PR fury-gl/fury#422 (merged): Integrated the 3d impostor spheres with the marker actor
  • PR fury-gl/fury#422 (merged): Fixed some issues with my maker PR which now it's merged on fury
  • PR fury-gl/fury#432 I've made some improvements in my PR which can be used to fine tuning the opengl state on VTK
  • PR fury-gl/fury#437 I've made several improvements in my streamer proposal for FURY. most of those improvements it's related with memory management. Using the SharedMemory from python 3.8 now it's possible to use the streamer direct on a jupyter without blocking
  • PR fury-gl/helios#1 First version of async network layout using force-directed.

Did I get stuck anywhere?

A python-core issue

I've spent some hours trying to discover this issue. But now it's solved through the commit devmessias/fury/commit/071dab85

The SharedMemory from python>=3.8 offers new a way to share memory resources between unrelated process. One of the advantages of using the SharedMemory instead of the RawArray from multiprocessing it’s that the SharedMemory allows to share memory blocks without those processes be related with a fork or spawm method. The SharedMemory behavior allowed to achieve our jupyter integration and simplifies the use of the streaming system. However, I saw a issue in the shared memory implementation.

Let’s see the following scenario:

1-Process A creates a shared memory X
2-Process A creates a subprocess B using popen (shell=False)
3-Process B reads X
4-Process B closes X
5-Process A kills B
4-Process A closes  X
5-Process A unlink() the shared memory resource X
This scenario should work well. unlink() X in it's the right way as discussed in the python official documentation. However, there is a open issue which a think it's related with the above scenario. Fortunately, I could use a monkey-patching solution to fix that meanwhile we wait to the python-core team to fix the resource_tracker (38119) issue.

What is coming up next?

I'm planning to work in the fury-gl/fury#432 and fury-gl/helios#1.
View Blog Post

Post #1 - A Stadia-like system for data visualization

demvessias
Published: 06/14/2021

Hi all! In this post I'll talk about the PR #437.

There are several reasons to have a streaming system for data visualization. Because I’m doing a PhD in a developing country I always need to think of the cheapest way to use the computational resources available. For example, with the GPU’s prices increasing, it’s necessary to share a machine with a GPU with different users in different locations. Therefore, to convince my Brazilian friends to use FURY I need to code thinking inside of the (a) low-budget scenario.

To construct the streaming system for my project I’m thinking about the following properties and behaviors:

  1. I want to avoid blocking the code execution in the main thread (where the vtk/fury instance resides)
  2. The streaming should work inside of a low bandwidth environment
  3. II need an easy way to share the rendering result. For example, using the free version of ngrok

To achieve the property 1. we need to circumvent the GIL problem. Using the threading module alone it’s not good enough because we can’t use the python-threading for parallel CPU computation. In addition, to achieve a better organization it’s better to define the server system as an uncoupled module. Therefore, I believe that multiprocessing-lib in python will fit very well for our proposes.

For the streaming system to work smoothly in a low-bandwidth scenario we need to choose the protocol wisely. In the recent years the WebRTC protocol has been used in a myriad of applications like google hangouts and Google Stadia aiming low latency behavior. Therefore, I choose the webrtc as my first protocol to be available in the streaming system proposal.

To achieve the third property, we must be economical in adding requirements and dependencies.

Currently, the system has some issues, but it's already working. You can see some tutorials about how to use this streaming system here. After running one of these examples you can easily share the results and interact with other users. For example, using the ngrok For example, using the ngrok


  ./ngrok http 8000  
 


How does it works?

The image bellow it's a simple representation of the streaming system.

As you can see, the streaming system is made up of different processes that share some memory blocks with each other. One of the hardest part of this PR was to code this sharing between different objects like VTK, numpy and the webserver. I'll discuss next some of technical issues that I had to learn/circunvent.

Sharing data between process

We want to avoid any kind of unnecessary duplication of data or expensive copy/write actions. We can achieve this economy of computational resources using the multiprocessing module from python.

multiprocessing RawArray

The RawArray from multiprocessing allows to share resources between different processes. However, there are some tricks to get a better performance when we are dealing with RawArray's. For example, take a look at my PR in a older stage. In this older stage my streaming system was working well. However, one of my mentors (Filipi Nascimento) saw a huge latency for high-resolutions examples. My first thought was that latency was caused by the GPU-CPU copy from the opengl context. However, I discovered that I've been using RawArray's wrong in my entire life!
See for example this line of code fury/stream/client.py#L101 The code bellow shows how I've been updating the raw arrays


raw_arr_buffer[:] = new_data

This works fine for small and medium sized arrays, but for large ones it takes a large amount of time, more than GPU-CPU copy. The explanation for this bad performance is available here : Demystifying sharedctypes performance. The solution which gives a stupendous performance improvement is quite simple. RawArrays implements the buffer protocol. Therefore, we just need to use the memoryview:


memview(arr_buffer)[:] = new_data

The memview is really good, but there it's a litle issue when we are dealing with uint8 RawArrays. The following code will cause an exception


memview(arr_buffer_uint8)[:] = new_data_uint8

There is a solution for uint8 rawarrays using just memview and cast methods. However, numpy comes to rescue and offers a simple and a more a generic solution. You just need to convert the rawarray to a np representation in the following way


arr_uint8_repr = np.ctypeslib.as_array(arr_buffer_uint8)
arr_uint8_repr[:] = new_data_uint8

You can navigate to my repository in this specific commit position and test the streaming examples to see how this little modification improves the performance.

Multiprocessing inside of different Operating Systems

Serge Koudoro, who is one of my mentors, has pointed out an issue of the streaming system running in MacOs. I don't know many things about MacOs, and as pointed out by Filipi the way that MacOs deals with multiprocessing is very different than the Linux approach. Although we solved the issue discovered by Serge, I need to be more carefully to assume that different operating systems will behave in the same way. If you want to know more,I recommend that you read this post Python: Forking vs Spawm. And it's also important to read the official documentation from python. It can save you a lot of time. Take a look what the official python documentation says about the multiprocessing method Take a look what the official python documentation says about the multiprocessing method

<small>Source: https://docs.python.org/3/library/multiprocessing.html</small>
View Blog Post

Weekly Check-In #1

demvessias
Published: 06/08/2021

Hi everyone! My name is Bruno Messias currently I'm a Ph.D student at USP/Brazil. In this summer I'll develop new tools and features for FURY-GL Specifically, I'll focus into developing a system for collaborative visualization of large network layouts using FURY and VTK.

What did I do this week?

In my first meeting the mentors explained the rules and the code of conduct inside the FURY organization. We also made some modifications in the timeline and discussed the next steps of my project. I started coding during the community bonding period. The next paragraph shows my contributions in the past weeks
  • A FURY/VTK webrtc stream system proposal: to the second part of my GSoC project I need to have a efficiently and easy to use streaming system to send the graph visualizations across the Internet. In addition, I also need this to my Ph.D. Therefore, I’ve been working a lot in this PR. This PR it’s also help me to achieve the first part of my project. Because I don’t have a computer with good specs in my house and I need to access a external computer to test the examples for large graphs.
  • Minor improvements into the shader markers PR and fine tunning open-gl state PR.

Did I get stuck anywhere?

I’ve been stuck into a performance issue (copying the opengl framebuffer to a python rawarray) which caused a lot of lag in the webrtc streamer. Fortunately, I discovered that I’ve been using rawarrays in the wrong way. My commit solved this performance issue.

What is coming up next?

In this week I'll focus on finish the #432 and #422 pull-requests.
View Blog Post