In the last 2 week, I use the line_profiler (kernprof) to profile the function _update_mutual_information() line by line of imaffine.py when running the affine registration example.
We can see the most time consuming functions are: 1. gradient() – 43.5%, 2. update_gradient_dense() – 38.5%, 3. _update_histogram() – 17.8%.
For the first function, the actual time consuming part is inside the function _gradient_3d() of vector_fields.pyx. I cythonized it, in which I made it cdef void and nogil, this gave me 1.47 times speed up. And then I parallelized it using prange and local function, it totally gave me 6.48 times speed up.
For the second function, the actual time consuming part is inside the function _joint_pdf_gradient_dense_3d() of parzenhist.pyx. I made it void and nogil, this gave me 3.05 times speed up. But if I parallelize it using prange, it will be slow down. So I didn’t do that.
For the third one, the real time consuming part is in the function _compute_pdfs_dense_3d() of parzenhist.pyx. Again, I made it void and nogil, the run time performance was almost the same as before. However, if I parallelize it using prange, it will again be slow down. So, like the second one, I didn’t do this.
Then I go on to try to parallelize the diffeomorphic registration. I first profile the example of diffeomorphic registration, then try to do the line profiler. And I find out the most time consuming function is __invert_models() – 47%. And the actual function that takes time is _compose_vector_fields_3d(). I try to parallelize it, but the speed up performance is not quite clear. I will try to investigate more on this.