Second week of GSoC: Description of two exemplary work projects

Published: 06/09/2019

In today's blog post I will describe two example projects that I have been working on during the last week. finally I will describe how these two examples relate to my overall goal in this GSoC.

Conversion of MNE-somato-data

This week I spent some time converting a dataset to comply with the Brain Imaging Data Structure: The MNE-somato-data, which is used in several code examples and tutorials for the MNE-Python documentation.

The Brain Imaging Data Structure is an emerging standard on how to organize and structure neuroimaging data recordings such as MRI, EEG, MEG, or iEEG data. Such a standard is invaluable to improve the sharing of data, performing quality analysis, and building automated pipelines.

Converting existing datasets to this new standard allows us to reap all of these benefits and build on them in the future.

However, the conversion is often not very straight forward. In the particular case of the MNE-somato-data I was facing a severe lack of documentation. Thus, the conversion from an arbitrary data structure to the standard of BIDS was slower than expected, yet now the somato dataset has a much better documentation on top of being organized in a sensible standard.

Autoreject documentation

The autoreject package is Python software to "clean" electrophysiology data such as EEG and MEG. It uses a process of crossvalidation to automatically find thresholds that can be used to reject or retain parts of the data. In addition, there is an algorithm to repair data data that might be rejected otherwise (because of exceeding the crossvalidated threshold).

When using a software package such as autoreject, the documentation of the inner workings are almost as important as the functionality of the software itself: Especially when it comes to the analysis of scientific data by researchers, who are often not trained to go through sourcecode and understand the inner workings themselves.

The autoreject package has some documentation in the form of examples that show off the basic functionality. On top, there is a small FAQ section that addresses user needs beyond getting information about basic functionality.

This week, I added a section on the general understanding of the algorithm, not directly related to code. Providing this intuitive explanation up front can be used to approach the more mathematical explanations to be found in the associated scientific publication.

Throughout this process, I have tried to follow the guidelines on "good documentations" that are always split into 4 parts: "Tutorials", "How-to guides", "Explanation", and "Reference"

good documentation picture


How does this related to my overall project?

My overall project goal is to enable or enhance automatic processing of neurophsyiology datasets organized using BIDS. The conversion of the MNE-somato-data to BIDS provides me with a testing case for analyses pipelines. And, as already evident from its name, the autoreject package is a prime candidate for automatic processing of neurophysiology data and it is a good idea to improve the documentation of the software that you want other people to use.