DIPY Workflow, Image registration And Documenting the Test Cases





Some background: One of the important objectives of my GSoC project is to develop quality workflows to serve the scientific community better. Good workflows are crucial to enable the outreach and well-defined usage of the various features in the DIPY package.

How can workflows substantiate the Outreach? DIPY contains implementation for many scientific algorithms that are used in a routine analysis of MRI data. Few of these implementations are fairly straightforward and easy to grasp and use (credit also goes to Python’s intuitive syntax and the open source community for contributing to the DIPY project).

However, in outreach, the focus is on not-so-programming-friendly user base, for example, medical practitioners, life sciences experts or users in academia who would like to leverage DIPY quickly to address their own research problems. This does not mean that they cannot implement their own packages (surely they can) and DIPY as a community project depends on feedback and improvements from many such users.

The objective is to provide end-to-end processing pipeline to the user with the minimum learning curve. In workflows, several individual components (module, a function) of DIPY are combined in a well-defined manner to deliver the implementations with good software development practices. This abstracts away the low-level details from the users while allowing them to use DIPY.

Experienced users can explore the individual components of the workflows and have fine-grained control by tweaking the parameters. (Accessible through the help)

How can workflows ensure well-defined usage? Each workflow that combines multiple components also follows a rigorous testing and quality assurance procedure to check and validate the output from various intermediate components. This results in a well-tested series of steps to achieve a specific objective with DIPY.

These past 2 weeks I have been working on the creating the image registration workflow simultaneously while fixing other issues in the DIPY (See this post).

The Image Registration: Put simply, registration means to align a pair of images (MRI data) so that the downstream analysis can be performed on the registered image. Since the raw data obtained from the DMRI consist of moving images which need to be pre-processed for other types of analysis.

The registration of MRI data is a complex process with multiple options available for registering the images, for example:

A) Registration based on the Center of Mass.

B) Registration based on the Translation of Images.

C) Registration based on the Rigid body Transformation.

D) Full Affine Registration, that involves center of mass, translation, rigid body transformation, shear, and scaling of the data.

Below is the link to the workflow that I have developed for registering the image data.

Commit LinkImage registration Workflow 

In the coming weeks, I will be improving the unit tests for this workflow. In addition to testing the expected behavior (correct output), the test cases will also check the erroneous output (where an error is created intentionally).

Commit Link: Testing the Image Registration Workflow

Together, the registration workflow and the testing framework will provide a standardized option for the users to register images in various modes (and be ensured that the output is generated after passing multiple tests).

Documenting the Use Case(s) for IOIterator in the Workflow

As a good documentation practice, I also created multiple use cases for running a workflow with a combination of input and output parameters.

This was done exclusively to check the creation of output files in response to the location of input files, usage of wild cards and enabling the parameters in the workflow.

These use cases will serve as a comprehensive guide for users looking to learn about various usage scenarios of workflows.

The documentation can be found at the following link:

Commit Link: Documenting the use cases 

Extract from the Document: (dipy_append_text is the sample workflow created for the purpose of this testing.)

S. no. Test case Details (dipy_test_cases: is the parent directory containing all the experiment directory (exp1, exp2 etc.) and the respective input files for testing.) Optional flag
1. Test case-1: Both input files are present in the same directory and no output directory path is provided.

Directory: exp1 (experiment1)

Command: dipy_append_text in1.txt in2.txt

Output: An output file is written in the same directory ‘out_file.txt’.

The –force flag is used. This enforces the overwriting of the output file.

Command: dipy_append_text in1.txt in2.txt –force









2. Test case-2: An output directory within the current directory is specified and –force flag is used.

Directory: exp1 (experiment1)

Command: dipy_append_text in1.txt in2.txt –force –out_dir tmp

 Output: An output file (out_file.txt) is written in the directory ‘tmp’ within the exp1 directory.





–force –out_dir

3. Test case-3: Going one level up in the directory and executing the workflow with input files and path.

Directory: dipy_test_cases

Command: dipy_append_text exp1/in1.txt exp1/in2.txt –force –out_dir tmp

 Output: An output file (out_file.txt) is written in the directory ‘tmp’ within the exp1 directory. The previous ‘tmp’ directory is overwritten by this command.

Note: Due to –force flag, the previous ‘tmp’ directory was overwritten.





–force –out_dir

That’s it for now folks!

Parichit Sharma

Graduate Computer Science Student

School of Informatics, Computing & Engineering (SICE)

Indiana University, Bloomington




Finding and fixing the ‘small and crucial’ issues in the DIPY.




Finding and fixing the issues: After a week of brainstorming and reading through the basic tutorials and documentation of DIPY. I discovered the following issues in the documentation and the code base.

Each of the reported issues is described below:

  1. Fixing the documentation of the workflows: The tutorial webpage for workflow creation in DIPY (workflow) did not mention importing the newly created method from the workflow. It only mentioned importing the run_flow method from the flow_runner class.  This will only work in case the workflow is called directly from the command line but will not work if it has to be wrapped in a separate python file and called from elsewhere.

Solving the issue: I updated the documentation and included the required import statement in the documentation.

Commit Link: Updated the workflow_creation.py 

This Pull request has been successfully merged with the code base 🙂

2.  Displaying a nice and helpful message when a workflow is invoked without any inputs: DIPY requires the workflows to be invoked with certain input parameters where both the number and format of the input is strictly important.

Behavior: Invoking the workflow without any input parameters just resulted in an error trace without any helpful message for the user. (This stack trace was hard to decipher)

Solving the issue: This behavior was handled inside the argparse.py file and a conditional check was used to display the appropriate message to the user about missing parameters.

PR number: 1523

Commit Link:  Showing help when no input parameters are provided to the workflow

3. Suppressing the harmless h5py warnings: Due to the dependency of DIPY on certain features of the older version of h5py, the h5py package cannot be updated in the new release.

Behavior: There was always a ‘Future Warning’ from the h5py package whenever a workflow was invoked.

The root cause analysis: Since all the workflows essentially make use of the run_flow method of the flow_runner class so it was the right place to handle this warning. This is so because the run_flow method is imported before any other imports in the workflow script.

Solving the issue: I created a custom exception handler in the flow_runner.py class to catch the ‘FutureWarning’. This suppressed the harmless (but annoying) warning from h5py.

PR number: 1523

Commit Link: Suppressing the ‘FutureWarning’ from the h5py package. 

4. Catching the argument mismatch between the run method and the doc string: All workflows requires strict documentation for the parameters provided to the run method. There are formatting restrictions imposed due to adherence to PEP8 code styling guidelines. Also, there is a need to document both the positional and optional parameters.

Behavior: The workflow exited with a cryptic error trace (usually difficult to understand). This happened whenever there was a mismatch between the number of parameters mentioned in the doc string and the run method. However, there was no conditional check for handling this behavior.

The root cause of the error: In the file base.py the number of arguments in the doc string and the run method were not compared to establish equal length (which is required) and so the workflow simply lead to a cumbersome error trace whenever that happened.

Solving the issue: I created a simple conditional check to ensure that the doc string parameters matches exactly with that of the run method and raised a ValueError otherwise.

PR number: 1533

Commit Link: Mismatching arguments between the doc string and the run method

That’s it for now folks!

Stay Tuned for a fun and powerful image registration workflow with DIPY

Parichit Sharma

Graduate Computer Science Student

School of Informatics, Computing & Engineering (SICE)

Indiana University, Bloomington


Sneak-Peek into the DIPY Workflows and Philosophy





Well, first things first- DIPY stands for the Diffusion Imaging in Python. DIPY is a medical imaging software meant to analyze and interpret the data generated by MRI systems (primarily the brain images and other supporting data – system parameters, meta-data etc.). DIPY is an open source initiative (under the hood of Python Software Foundation) and provides opportunities for scientific package implementation, powerful software engineering, exciting visualization techniques to leverage state of the art hardware systems (GPU shaders and more) and data-driven analytics (algorithms to improve image registration and more).

My Work and Its Usefulness

For me, I will be working on creating feature-rich and user-friendly workflows that will become part of the DIPY source code. DIPY has a significant collection of scientific algorithms that can be linked via custom python scripts for creating and delivering flexible workflows to the end user. Though powerful in functionality, not all tutorials in DIPY have their individual workflows, well not yet. After passing manual and automated validation and checks, these workflows will help medical experts, researchers, and medical doctors to quickly analyze the MRI data in a standard manner.

Exploring the Code Base

In the past, I have been going through the code base of DIPY and trying to learn the navigation around its source code. I mean understanding how the code is structured and organized. In this context, Dr. Eleftherios Garyfallidis and Serge Koudoro, founder and core developer of the DIPY respectively, have been very helpful. Now, I have a clear understanding of how the files and data are organized in the code base.

A few hours and several tests run later, I realized why they created the introspective parser and the places where there is scope for quick improvement. We discussed a list of things that were to be done on a priority basis.

Also To be Added

A good amount of work will also be dedicated to ensuring that the workflows are executing as expected and testing them on a variety of datasets and platforms. This will ensure that the code behaves as expected and in turn will add to the quality of the package.

A relatively challenging part of the assignment will be to integrate some visualization tool or intermediate output parsers to do a sanity check on the quality of intermediate output. This will prevent too many errors or too much troubleshooting down the line.

Closing for now 🙂

That’s it, for now, folks.

Stay tuned for real development updates and exciting new workflows. Oh yes, there will be awesome visualization too.


DIPY GitHub Code Base

My Forked Repository

Parichit Sharma

Graduate Computer Science Student

School of Informatics, Computing & Engineering (SICE)

Indiana University, Bloomington