Improving the Workflow Code (based on the Community Feedback)

 

Hey there,

So, there has been some delay in the timeline for my project but this delay isn’t intentional, I have been able to create good results for visual quality assessment. In fact, the Image Registration workflow is supplemented by both quantitative and qualitative metrics for assessment.

This blog post is about reviewing what has been done in the Image Registration Workflow and also about the many improvements that (I have been working on lately) have been done in the code based on community feedback.

Reviewing the additions to the workflow (for assessing quality)

For quantitative assessment: There is the option of saving the distance and optimal parameter metric now. For details about the code, see the following PR.

(old) PR for quantitative metric addition in the workflow

Testing and benchmarking the Image Registration with DIPY

Qualitative assessment: I am working on a separate branch to create the mosaic of registered images, this branch isn’t the part of the master because the primary Image registration workflow is not being merged (yet). More details about what these mosaics are and how they can help in quality assessment can be seen in one of my older posts below.

Visualizing the Registration Progress

Reviewing the improvements to the Image Registration workflow

The PR 1581 got a good response from the DIPY development community. I am discussing a few of the issues (by no means this discussion is exhaustive, it is just that I am selecting the points which I think made a difference). Majority of these issues are noted by the DIPY developer @nilgoyette.  Many thanks to @Nilgoyette for pointing them out and helping me to improve the code.

Modifying the code based on the community feedback (Many thanks to @NilGoyette & @skoudoro for the useful comments) 

Improvement-1: Following the code consistency and uniform standards. I overlooked the fact that simple things like, ‘import statement’ and line widths are not uniform in the file after I added my code.  For example, using both “,” and “()” for importing multiple classes from a module.

Old Commit

New Commit 

While these things don’t hurt the functionality of the code but they make the code look more consistent. I updated the code to follow same standards everywhere.

Improvement-2:  Moving to assert_almost_equal for comparing long floating point numbers. I wrote redundant code by rounding off the floating number to compare it with another float whereas, the NumPY’s assert_almost_equal are made for this purpose, so I updated the old test case by moving to assert_almost_equal.

This not only improved the code readability but also made the test objective clear.  Furthermore, using the NumPy’s default functions made the test cases look more consistent with the code base.

Commit Link

Improvement-3: Reducing the code duplication. The Image Registration workflow is complicated by the fact that it supports multiple registration modes both progressively and non-progressively. This lead to a part of various local functions being duplicated, I have reduced the code duplication (marginally, though) by moving a part of original code into a separate function.

Commit Link

Improvement-4: Using the “_” placeholder in python for variables not going to be used (but returned by the function call). I was using a variable name to hold the data returned by the function but the variable wasn’t used anywhere in the code later, So I moved to a more pythonic way of holding the data by using the “_” placeholder.

Commit Link

Improvement-5: Using the python’s default assert statement in the test cases. Part of the test cases was simply using the NumPy’s assert_equal to check for equality but the equality was checked against the boolean ‘True/False’ and so it made more sense to just use the default assert for doing such checks. Not that the assert_equal was incorrect but using assert made more sense for doing unit checks such as checking for equality to True/False.

Commit Link

All this feedback made the code more consistent and optimal. All these improvements (along with other changes to the code base) are now part of the PR 1581 and waiting to be merged.

In the coming weeks, I will be sharing more details about the results of apply_transform workflow (as promised earlier) and also about awesome new visualization that can be done with registered data by using the native matplotlib calls. More details about the apply_transform_workflow can be seen in the following post,

Transforming multiple MRI Images (in a Jiffy!)

Adios for now!

Parichit.

DIPY Workflow, Image registration And Documenting the Test Cases

 

 

 

 

Some background: One of the important objectives of my GSoC project is to develop quality workflows to serve the scientific community better. Good workflows are crucial to enable the outreach and well-defined usage of the various features in the DIPY package.

How can workflows substantiate the Outreach? DIPY contains implementation for many scientific algorithms that are used in a routine analysis of MRI data. Few of these implementations are fairly straightforward and easy to grasp and use (credit also goes to Python’s intuitive syntax and the open source community for contributing to the DIPY project).

However, in outreach, the focus is on not-so-programming-friendly user base, for example, medical practitioners, life sciences experts or users in academia who would like to leverage DIPY quickly to address their own research problems. This does not mean that they cannot implement their own packages (surely they can) and DIPY as a community project depends on feedback and improvements from many such users.

The objective is to provide end-to-end processing pipeline to the user with the minimum learning curve. In workflows, several individual components (module, a function) of DIPY are combined in a well-defined manner to deliver the implementations with good software development practices. This abstracts away the low-level details from the users while allowing them to use DIPY.

Experienced users can explore the individual components of the workflows and have fine-grained control by tweaking the parameters. (Accessible through the help)

How can workflows ensure well-defined usage? Each workflow that combines multiple components also follows a rigorous testing and quality assurance procedure to check and validate the output from various intermediate components. This results in a well-tested series of steps to achieve a specific objective with DIPY.

These past 2 weeks I have been working on the creating the image registration workflow simultaneously while fixing other issues in the DIPY (See this post).

The Image Registration: Put simply, registration means to align a pair of images (MRI data) so that the downstream analysis can be performed on the registered image. Since the raw data obtained from the DMRI consist of moving images which need to be pre-processed for other types of analysis.

The registration of MRI data is a complex process with multiple options available for registering the images, for example:

A) Registration based on the Center of Mass.

B) Registration based on the Translation of Images.

C) Registration based on the Rigid body Transformation.

D) Full Affine Registration, that involves center of mass, translation, rigid body transformation, shear, and scaling of the data.

Below is the link to the workflow that I have developed for registering the image data.

Commit LinkImage registration Workflow 

In the coming weeks, I will be improving the unit tests for this workflow. In addition to testing the expected behavior (correct output), the test cases will also check the erroneous output (where an error is created intentionally).

Commit Link: Testing the Image Registration Workflow

Together, the registration workflow and the testing framework will provide a standardized option for the users to register images in various modes (and be ensured that the output is generated after passing multiple tests).

Documenting the Use Case(s) for IOIterator in the Workflow

As a good documentation practice, I also created multiple use cases for running a workflow with a combination of input and output parameters.

This was done exclusively to check the creation of output files in response to the location of input files, usage of wild cards and enabling the parameters in the workflow.

These use cases will serve as a comprehensive guide for users looking to learn about various usage scenarios of workflows.

The documentation can be found at the following link:

Commit Link: Documenting the use cases 

Extract from the Document: (dipy_append_text is the sample workflow created for the purpose of this testing.)

S. no. Test case Details (dipy_test_cases: is the parent directory containing all the experiment directory (exp1, exp2 etc.) and the respective input files for testing.) Optional flag
1. Test case-1: Both input files are present in the same directory and no output directory path is provided.

Directory: exp1 (experiment1)

Command: dipy_append_text in1.txt in2.txt

Output: An output file is written in the same directory ‘out_file.txt’.

The –force flag is used. This enforces the overwriting of the output file.

Command: dipy_append_text in1.txt in2.txt –force

 

 

 

 

 

 

 

–force

2. Test case-2: An output directory within the current directory is specified and –force flag is used.

Directory: exp1 (experiment1)

Command: dipy_append_text in1.txt in2.txt –force –out_dir tmp

 Output: An output file (out_file.txt) is written in the directory ‘tmp’ within the exp1 directory.

 

 

 

 

–force –out_dir

3. Test case-3: Going one level up in the directory and executing the workflow with input files and path.

Directory: dipy_test_cases

Command: dipy_append_text exp1/in1.txt exp1/in2.txt –force –out_dir tmp

 Output: An output file (out_file.txt) is written in the directory ‘tmp’ within the exp1 directory. The previous ‘tmp’ directory is overwritten by this command.

Note: Due to –force flag, the previous ‘tmp’ directory was overwritten.

 

 

 

 

–force –out_dir

Adios for now!

Parichit

 

Finding and fixing the ‘small and crucial’ issues in the DIPY.

 

 

 

Finding and fixing the issues: After a week of brainstorming and reading through the basic tutorials and documentation of DIPY. I discovered the following issues in the documentation and the code base.

Each of the reported issues is described below:

  1. Fixing the documentation of the workflows: The tutorial webpage for workflow creation in DIPY (workflow) did not mention importing the newly created method from the workflow. It only mentioned importing the run_flow method from the flow_runner class.  This will only work in case the workflow is called directly from the command line but will not work if it has to be wrapped in a separate python file and called from elsewhere.

Solving the issue: I updated the documentation and included the required import statement in the documentation.

Commit Link: Updated the workflow_creation.py 

This Pull request has been successfully merged with the code base 🙂

2.  Displaying a nice and helpful message when a workflow is invoked without any inputs: DIPY requires the workflows to be invoked with certain input parameters where both the number and format of the input is strictly important.

Behavior: Invoking the workflow without any input parameters just resulted in an error trace without any helpful message for the user. (This stack trace was hard to decipher)

Solving the issue: This behavior was handled inside the argparse.py file and a conditional check was used to display the appropriate message to the user about missing parameters.

PR number: 1523

Commit Link:  Showing help when no input parameters are provided to the workflow

This Pull request has been successfully merged with the code base 🙂

3. Suppressing the harmless h5py warnings: Due to the dependency of DIPY on certain features of the older version of h5py, the h5py package cannot be updated in the new release.

Behavior: There was always a ‘Future Warning’ from the h5py package whenever a workflow was invoked.

The root cause analysis: Since all the workflows essentially make use of the run_flow method of the flow_runner class so it was the right place to handle this warning. This is so because the run_flow method is imported before any other imports in the workflow script.

Solving the issue: I created a custom exception handler in the flow_runner.py class to catch the ‘FutureWarning’. This suppressed the harmless (but annoying) warning from h5py.

PR number: 1523

Commit Link: Suppressing the ‘FutureWarning’ from the h5py package. 

This Pull request has been successfully merged with the code base 🙂

4. Catching the argument mismatch between the run method and the doc string: All workflows requires strict documentation for the parameters provided to the run method. There are formatting restrictions imposed due to adherence to PEP8 code styling guidelines. Also, there is a need to document both the positional and optional parameters.

Behavior: The workflow exited with a cryptic error trace (usually difficult to understand). This happened whenever there was a mismatch between the number of parameters mentioned in the doc string and the run method. However, there was no conditional check for handling this behavior.

The root cause of the error: In the file base.py the number of arguments in the doc string and the run method were not compared to establish equal length (which is required) and so the workflow simply lead to a cumbersome error trace whenever that happened.

Solving the issue: I created a simple conditional check to ensure that the doc string parameters matches exactly with that of the run method and raised a ValueError otherwise.

PR number: 1533

Commit Link: Mismatching arguments between the doc string and the run method

This Pull request has been successfully merged with the code base 🙂

Adios for now!

Parichit

Sneak-Peek into the DIPY Workflows and Philosophy

  

 

 

Introduction

Well, first things first- DIPY stands for the Diffusion Imaging in Python. DIPY is a medical imaging software meant to analyze and interpret the data generated by MRI systems (primarily the brain images and other supporting data – system parameters, meta-data etc.). DIPY is an open source initiative (under the hood of Python Software Foundation) and provides opportunities for scientific package implementation, powerful software engineering, exciting visualization techniques to leverage state of the art hardware systems (GPU shaders and more) and data-driven analytics (algorithms to improve image registration and more).

My Work and Its Usefulness

For me, I will be working on creating feature-rich and user-friendly workflows that will become part of the DIPY source code. DIPY has a significant collection of scientific algorithms that can be linked via custom python scripts for creating and delivering flexible workflows to the end user. Though powerful in functionality, not all tutorials in DIPY have their individual workflows, well not yet. After passing manual and automated validation and checks, these workflows will help medical experts, researchers, and medical doctors to quickly analyze the MRI data in a standard manner.

Exploring the Code Base

In the past, I have been going through the code base of DIPY and trying to learn the navigation around its source code. I mean understanding how the code is structured and organized. In this context, Dr. Eleftherios Garyfallidis and Serge Koudoro, founder and core developer of the DIPY respectively, have been very helpful. Now, I have a clear understanding of how the files and data are organized in the code base.

A few hours and several tests run later, I realized why they created the introspective parser and the places where there is scope for quick improvement. We discussed a list of things that were to be done on a priority basis.

Also To be Added

A good amount of work will also be dedicated to ensuring that the workflows are executing as expected and testing them on a variety of datasets and platforms. This will ensure that the code behaves as expected and in turn will add to the quality of the package.

A relatively challenging part of the assignment will be to integrate some visualization tool or intermediate output parsers to do a sanity check on the quality of intermediate output. This will prevent too many errors or too much troubleshooting down the line.

Closing for now 🙂

That’s it, for now, folks.

Stay tuned for real development updates and exciting new workflows. Oh yes, there will be awesome visualization too.

DIPY Home

DIPY GitHub Code Base

My Forked Repository

Adios for now!

Parichit