Some background: One of the important objectives of my GSoC project is to develop quality workflows to serve the scientific community better. Good workflows are crucial to enable the outreach and well-defined usage of the various features in the DIPY package.
How can workflows substantiate the Outreach? DIPY contains implementation for many scientific algorithms that are used in a routine analysis of MRI data. Few of these implementations are fairly straightforward and easy to grasp and use (credit also goes to Python’s intuitive syntax and the open source community for contributing to the DIPY project).
However, in outreach, the focus is on not-so-programming-friendly user base, for example, medical practitioners, life sciences experts or users in academia who would like to leverage DIPY quickly to address their own research problems. This does not mean that they cannot implement their own packages (surely they can) and DIPY as a community project depends on feedback and improvements from many such users.
The objective is to provide end-to-end processing pipeline to the user with the minimum learning curve. In workflows, several individual components (module, a function) of DIPY are combined in a well-defined manner to deliver the implementations with good software development practices. This abstracts away the low-level details from the users while allowing them to use DIPY.
Experienced users can explore the individual components of the workflows and have fine-grained control by tweaking the parameters. (Accessible through the help)
How can workflows ensure well-defined usage? Each workflow that combines multiple components also follows a rigorous testing and quality assurance procedure to check and validate the output from various intermediate components. This results in a well-tested series of steps to achieve a specific objective with DIPY.
These past 2 weeks I have been working on the creating the image registration workflow simultaneously while fixing other issues in the DIPY (See this post).
The Image Registration: Put simply, registration means to align a pair of images (MRI data) so that the downstream analysis can be performed on the registered image. Since the raw data obtained from the DMRI consist of moving images which need to be pre-processed for other types of analysis.
The registration of MRI data is a complex process with multiple options available for registering the images, for example:
A) Registration based on the Center of Mass.
B) Registration based on the Translation of Images.
C) Registration based on the Rigid body Transformation.
D) Full Affine Registration, that involves center of mass, translation, rigid body transformation, shear, and scaling of the data.
Below is the link to the workflow that I have developed for registering the image data.
In the coming weeks, I will be improving the unit tests for this workflow. In addition to testing the expected behavior (correct output), the test cases will also check the erroneous output (where an error is created intentionally).
Commit Link: Testing the Image Registration Workflow
Together, the registration workflow and the testing framework will provide a standardized option for the users to register images in various modes (and be ensured that the output is generated after passing multiple tests).
Documenting the Use Case(s) for IOIterator in the Workflow
As a good documentation practice, I also created multiple use cases for running a workflow with a combination of input and output parameters.
This was done exclusively to check the creation of output files in response to the location of input files, usage of wild cards and enabling the parameters in the workflow.
These use cases will serve as a comprehensive guide for users looking to learn about various usage scenarios of workflows.
The documentation can be found at the following link:
Commit Link: Documenting the use cases
Extract from the Document: (dipy_append_text is the sample workflow created for the purpose of this testing.)
|S. no.||Test case Details (dipy_test_cases: is the parent directory containing all the experiment directory (exp1, exp2 etc.) and the respective input files for testing.)||Optional flag|
|1.||Test case-1: Both input files are present in the same directory and no output directory path is provided.
Directory: exp1 (experiment1)
Command: dipy_append_text in1.txt in2.txt
Output: An output file is written in the same directory ‘out_file.txt’.
The –force flag is used. This enforces the overwriting of the output file.
Command: dipy_append_text in1.txt in2.txt –force
|2.||Test case-2: An output directory within the current directory is specified and –force flag is used.
Directory: exp1 (experiment1)
Command: dipy_append_text in1.txt in2.txt –force –out_dir tmp
Output: An output file (out_file.txt) is written in the directory ‘tmp’ within the exp1 directory.
|3.||Test case-3: Going one level up in the directory and executing the workflow with input files and path.
Command: dipy_append_text exp1/in1.txt exp1/in2.txt –force –out_dir tmp
Output: An output file (out_file.txt) is written in the directory ‘tmp’ within the exp1 directory. The previous ‘tmp’ directory is overwritten by this command.
Note: Due to –force flag, the previous ‘tmp’ directory was overwritten.
That’s it for now folks!
Graduate Computer Science Student
School of Informatics, Computing & Engineering (SICE)
Indiana University, Bloomington