Articles on sappelhoff's Bloghttps://blogs.python-gsoc.orgUpdates on different articles published on sappelhoff's BlogenSat, 24 Aug 2019 08:10:36 +0000Thirteenth week of GSoC: Final Checkinhttps://blogs.python-gsoc.org/en/sappelhoffs-blog/thirteenth-week-of-gsoc-final-checkin/<p class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-420 cms-render-model"><strong>1. What did you do this week?</strong></p> <p class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-420 cms-render-model">I have compiled a list for week 13 in my changelog here: <a href="https://github.com/sappelhoff/gsoc2019/blob/master/changelog.md#week-11">https://github.com/sappelhoff/gsoc2019/blob/master/changelog.md#week-13</a></p> <p class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-420 cms-render-model">This was the final week of my GSoC 2019. I have written a final report here: <a href="https://github.com/sappelhoff/gsoc2019/blob/master/FINAL_REPORT.md">https://github.com/sappelhoff/gsoc2019/blob/master/FINAL_REPORT.md</a></p> <p class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-420 cms-render-model"><strong>2. What is coming up next?</strong></p> <p class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-420 cms-render-model">Next up I will focus on my PhD work, hopefully making use of many of the features I helped to bring about!</p> <p class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-420 cms-render-model">I am sure that this will also entail making many bug reports, ... and fixing them. Although it will probably take a longer time, because the focused GSoC time is over for now.</p> <p class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-420 cms-render-model"><strong>3. Did you get stuck anywhere?</strong></p> <p class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-420 cms-render-model">This week, we had many discussions on digitized position data of electrophysiology sensors. This discussion eventually lead into whether we want to support template data in BIDS at all ... or whether BIDS should be just for true, measured data.</p> <p class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-420 cms-render-model">See:</p> <ul> <li class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-420 cms-render-model"><a href="https://github.com/bids-standard/bids-specification/issues/318">https://github.com/bids-standard/bids-specification/issues/318</a></li> <li class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-420 cms-render-model"><a href="https://github.com/bids-standard/pyedf/issues/7">https://github.com/bids-standard/pyedf/issues/7</a></li> <li class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-420 cms-render-model"><a href="https://github.com/mne-tools/mne-bids/pull/244#discussion_r313958721">https://github.com/mne-tools/mne-bids/pull/244#discussion_r313958721</a></li> </ul>Stefan.Appelhoff@gmail.com (sappelhoff)Sat, 24 Aug 2019 08:10:36 +0000https://blogs.python-gsoc.org/en/sappelhoffs-blog/thirteenth-week-of-gsoc-final-checkin/Twelveth week of GSoC: Getting ready for the final week of GSoChttps://blogs.python-gsoc.org/en/sappelhoffs-blog/twelveth-week-of-gsoc-getting-ready-for-the-final-week-of-gsoc/<p>My GSoC is soon coming to an end so I took some time to write down what still needs to be done:</p> <p><strong>Making a release of MNE-BIDS</strong></p> <p>In the past months, there were substantial additions, fixes, and cosmetic changes made to the codebase and documentation of MNE-BIDS. The last release has happened in April (about 4 months ago) and we were quite happy to observe some issues and pull requests raised and submitted by new users. With the next release we can provide some new functionality for this growing user base.</p> <p><strong>Handling coordinates for EEG and iEEG in MNE-BIDS</strong></p> <p>In MNE-BIDS, the part of the code that handles the writing of sensor positions in 3D space (=coordinates) is so far restricted to MEG data. Extending this functionality to EEG and iEEG data has been on the to do list for a long time now. Fortunately, I have been learning a bit more about this topic during my GSoC, and Mainak has provided some starting points in an unrelated PR that I can use to finish this issue. (After the release of MNE-BIDS though, to avoid cramming in too much last-minute content before the release)</p> <p><strong>Writing a data fetcher for OpenNeuro to be used in MNE-Python</strong></p> <p>While working with BIDS and M/EEG data, the need for good testing data has come up time and time again. For the mne-study-template we solved this issue with a combination of DataLad and OpenNeuro. Meanwhile, MNE-BIDS has its own dataset.py module ... however, we all feel like this module is duplicating the datasets module of MNE-Python and not advancing MNE-BIDS. Rather, it is confusing the purpose of MNE-BIDS.</p> <p>As a solution, we want to write a generalized data fetching function for MNE-Python that works with OpenNeuro ... without adding the DataLad (and hence Git-Annex) dependency). Once this fetching function is implemented, we can import it in MNE-BIDS and finally deprecate MNE-BIDS' dataset.py module.</p> <p><strong>Make a PR in MNE-Python that will support making Epochs for duplicate events (will fix ds001971 PR)</strong></p> <p>In MNE-Python, making data epochs is not possible, if two events share the same time. This became apparent with the dataset ds001971 that we wanted to add to the mne-study-template pipeline: <a href="https://github.com/mne-tools/mne-study-template/pull/41">https://github.com/mne-tools/mne-study-template/pull/41</a>. There was a suggestion on how to solve this issue by merging the event codes that occurred at the same time. Once this fix is implemented in MNE-Python, we can use this to finish the PR in the mne-study-template.</p> <p><strong>Salvage / close the PR on more "read_raw_bids" additions</strong></p> <p>Earlier in this GSoC, I made a PR intended to improve the reading functionality of MNE-BIDS (<a href="https://github.com/mne-tools/mne-bids/pull/244">https://github.com/mne-tools/mne-bids/pull/244</a>). However, the PR was controversially discussed, because it was not leveraging BIDS and instead relying on introducing a dictionary as a container for keyword arguments.</p> <p>After lots of discussion, we agreed to solve the situation in a different way (by leveraging BIDS) and Mainak made some initial commits into that direction. However in the further progress, the PR was dropped because other issues had higher priority.</p> <p>Before finishing my GSoC, I want to salvage what's possible from this PR and then close it ... and improving the original issue report so that the next attempt at this PR can rely on a more detailed objective.</p>Stefan.Appelhoff@gmail.com (sappelhoff)Mon, 19 Aug 2019 10:03:40 +0000https://blogs.python-gsoc.org/en/sappelhoffs-blog/twelveth-week-of-gsoc-getting-ready-for-the-final-week-of-gsoc/Eleventh week of GSoC: Some more Datalad (complete and automatic flow now)https://blogs.python-gsoc.org/en/sappelhoffs-blog/eleventh-week-of-gsoc-some-more-datalad-complete-and-automatic-flow-now/<p><strong>1. What did you do this week?</strong></p> <p>I have compiled a list for week 11 in my changelog here: <a href="https://github.com/sappelhoff/gsoc2019/blob/master/changelog.md#week-11">https://github.com/sappelhoff/gsoc2019/blob/master/changelog.md#week-11</a></p> <p><strong>2. What is coming up next?</strong></p> <p>Next, I will continue to improve the mne-study-template and also work on a new release of MNE-BIDS.</p> <p><strong>3. Did you get stuck anywhere?</strong></p> <p>As the week before, I got stuck a bit with Datalad. However, I finally fixed all problems and I want to report the flow of my pipeline below. enjoy!</p> <hr> <p><u>Pipeline to get any dataset as git-annex dataset</u></p> <p>using the following tools:</p> <ul> <li><em>OSF</em>: <a href="https://osf.io">https://osf.io</a></li> <li><em>osfclient</em>: <a href="https://github.com/osfclient/osfclient">https://github.com/osfclient/osfclient</a></li> <li><em>git-annex</em>: <a href="https://git-annex.branchable.com/">https://git-annex.branchable.com/</a></li> <li><em>datalad</em>: <a href="https://www.datalad.org/">https://www.datalad.org/</a></li> <li><em>datalad-osf</em>: <a href="https://github.com/templateflow/datalad-osf/">https://github.com/templateflow/datalad-osf/</a></li> <li><em>Github</em>: <a href="https://github.com">https://github.com</a></li> </ul> <ol> <li>Step 1 Upload data to OSF <ol> <li>install osfclient: `pip install osfclient` (see https://github.com/osfclient/osfclient)</li> <li>make a new OSF repository from the website (need to be registered)</li> <li>copy the "key" from the new OSF repository, e.g., "3qmer" for the URL: "https://osf.io/3qmer/"</li> <li>navigate to the directory that contains the directory you want to upload to OSF</li> <li>make a `.osfcli.config` file: `osf init` ... this file gets written into the current working directory</li> <li>call `osf upload -r MY_DATA/ .` to upload your data, replacing MY_DATA with your upload directory name</li> <li>instead of being prompted to input your password, you can define an environment variable OSF_PASSWORD with your password. This has the advantage that you could start an independent process without having to wait and leave your command line prompt open: `nohup osf upload -r MY_DATA/ . &amp;`</li> <li>NOTE: Recursive uploading using osfclient can be a bad experience. Check out this wrapper script for more control over the process: <a href="https://github.com/sappelhoff/gsoc2019/blob/master/misc_code/osfclient_wrapper.py">https://github.com/sappelhoff/gsoc2019/blob/master/misc_code/osfclient_wrapper.py</a></li> </ol> </li> <li>Step 2 Make a git-annex dataset out of the OSF data <ol> <li>install datalad-osf: git clone and use `pip install -e .` NOTE: You will need the patch submitted here: https://github.com/templateflow/datalad-osf/pull/2</li> <li>install datalad: `pip install datalad` and git-annex (e.g., via conda-forge)</li> <li>create your data repository: `datalad create MY_DATA`</li> <li>go there and download your OSF data using datalad-osf: `cd MY_DATA` ... then `python -c "import datalad_osf; datalad_osf.update_recursive(key='MY_KEY')"`, where MY_KEY is the "key" from step 1 above.</li> </ol> </li> <li>Step 3 Publish the git-annex dataset on GitHub <ol> <li>Make a fresh (empty) repository on GitHub: &lt;repo_url&gt;</li> <li>Clone your datalad repo: datalad install -s &lt;local_repo&gt; clone</li> <li>cd clone</li> <li>git annex dead origin  </li> <li>git remote rm origin</li> <li>git remote add origin &lt;repo_url&gt;</li> <li>datalad publish --to origin</li> </ol> </li> <li>Step 4 Get parts of your data (or everything) from the git-annex repository <ol> <li>datalad install &lt;repo_url&gt;</li> <li>cd &lt;repo&gt;</li> <li>datalad get &lt;some_folder_or_file_path&gt;</li> <li>datalad get .</li> </ol> </li> </ol> <p><u>Important sources / references</u></p> <ul> <li><a href="https://blogs.python-gsoc.org/en/sappelhoffs-blog/tenth-week-of-gsoc-git-annex-and-datalad/">https://blogs.python-gsoc.org/en/sappelhoffs-blog/tenth-week-of-gsoc-git-annex-and-datalad/</a></li> <li><a href="https://github.com/templateflow/datalad-osf/issues/1">https://github.com/templateflow/datalad-osf/issues/1</a></li> </ul>Stefan.Appelhoff@gmail.com (sappelhoff)Sun, 11 Aug 2019 19:44:16 +0000https://blogs.python-gsoc.org/en/sappelhoffs-blog/eleventh-week-of-gsoc-some-more-datalad-complete-and-automatic-flow-now/Tenth week of GSoC: git-annex and dataladhttps://blogs.python-gsoc.org/en/sappelhoffs-blog/tenth-week-of-gsoc-git-annex-and-datalad/<p>In the last weeks Alex, Mainak and I were working on making the mne-study-template compatible with the Brain Imaging Data Structure (BIDS). This process involved testing the study template on many different BIDS datasets, to see where the template is not yet general enough, or where bugs are hidden.</p> <p>To improve our testing, we wanted to set up a continuous integration service that automatically runs the template over different datasets for every git commit that we make. Understandably however, all integration services (such as CircleCI) have a restriction on how much data can be downloaded in a single test run. This meant we needed lightweight solutions that can pull in only small parts of the datasets we wanted to test.</p> <p>And this is where git-annex and datalad enter the conversation.</p> <p><strong>git-annex</strong></p> <p><a href="https://git-annex.branchable.com/">git-annex</a> is a software that allows managing large files with git. One could see git-annex as a competitor to <a href="https://git-lfs.github.com/">git-lfs</a> ("Large File Storage"), because both solve the same problem. They differ in their technical implementation and have different pros and cons. A good summary can be found in this stackoverflow post: <a href="https://stackoverflow.com/a/39338319/5201771">https://stackoverflow.com/a/39338319/5201771</a></p> <p><strong>Datalad</strong></p> <p><a href="https://www.datalad.org/">Datalad</a> is a Python library that <em>"builds on top of git-annex and extends it with an intuitive command-line interface"</em>. Datalad can also be seen as a "portal" to many git-annex datasets openly accessible in the Internet.</p> <p><strong>Recipe: How to turn any online dataset into a GitHub-hosted git-annex repository</strong></p> <p>Requirements: git-annex, datalad, unix-based system</p> <p>Installing git-annex worked great using <a href="https://docs.conda.io/en/latest/miniconda.html">conda</a> and the <a href="https://anaconda.org/conda-forge/git-annex">conda-forge</a> for package git-annex:</p> <pre><code>conda install git-annex -c conda-forge</code></pre> <p>The installation of datalad is very simple via pip:</p> <pre><code>pip install datalat</code></pre> <p>Now find the dataset you want to turn into a git-annex repository. In this example, we'll use the Matching Pennies dataset hosted on OSF: <a href="https://osf.io/cj2dr/">https://osf.io/cj2dr/</a></p> <p>We now need to create a CSV file with two columns. Each row of the file will reflect a single file we want to have in the git-annex repository. In the first column we will store the file path relative to the root of the dataset, and in the second column we will store the download URL of that file.</p> <p>Usually, the creation of this CSV file should be automated using software. For OSF, we have the <a href="https://github.com/templateflow/datalad-osf">datalad-osf package</a> which can do the job. However, that package is still in development so <a href="https://github.com/sappelhoff/eeg_matchingpennies/wiki/eeg_matchingpennies.py">I wrote my own function</a>, which involved picking out many download URLs and file names by hand :-(</p> <p>On OSF, the URLs are given by &lt;span style="font-family: Courier New,Courier,monospace;"&gt;https://osf.io/&lt;key&gt;/download&lt;/key&gt;&lt;/span&gt; where &lt;key&gt; is dependent on the file.&lt;/key&gt;</p> <p>See two example rows of my CSV (note the headers, which are important later on):</p> <pre><code>fpath, url sub-05/eeg/sub-05_task-matchingpennies_channels.tsv, https://osf.io/wdb42/download sourcedata/sub-05/eeg/sub-05_task-matchingpennies_eeg.xdf, https://osf.io/agj2q/download</code></pre> <p>Once your CSV file is ready, and git-annex and datalad are installed, it is time to switch to the command line.</p> <pre><code># create the git-annex repository datalad create eeg_matchingpennies # download the files in the CSV and commit them datalad addurls mp.csv "{url}" "{fpath}" -d eeg_matchingpennies/ # print our files and the references where to find them # will show a local address (the downloaded files) and a web address (OSF) git annex whereis # Make a clone of your fresh repository datalad install -s eeg_matchingpennies clone # go to the clone cd clone # disconnect the clone from the local data sources git annex dead origin # disconnect the clone from its origin git remote rm origin # print our files again, however: Notice how all references to # the local files are gone. Only the web references persist git annex whereis </code></pre> <p>Now make a new empty repository on GitHub: <a href="https://github.com/sappelhoff/eeg_matchingpennies">https://github.com/sappelhoff/eeg_matchingpennies</a></p> <pre><code># add a new origin to the clone git remote add origin https://github.com/sappelhoff/eeg_matchingpennies # upload the git-annex repository to GitHub datalad publish --to origin </code></pre> <p>Now your dataset is ready to go! Try it out as described below:<br>  </p> <pre><code># clone the repository into your current folder datalad install https://github.com/sappelhoff/eeg_matchingpennies # go to your repository cd eeg_matchingpennies # get the data for sub-05 (not just the reference to it) datalad get sub-05 # get only a single file datalad get sub-05/eeg/sub-05_task-matchingpennies_eeg.vhdr # get all the data datalad get . </code></pre> <p><strong>Acknowledgments and further reading</strong></p> <p>I am very thankful to <a href="https://github.com/kyleam">Kyle A. Meyer</a> and <a href="https://github.com/yarikoptic">Yaroslav Halchenko</a> for their support in this <a href="https://github.com/templateflow/datalad-osf/issues/1">GitHub issue thread</a>. If you are running into issues with my recipe, I recommend that you fully read that GitHub issue thread.</p> <p> </p>Stefan.Appelhoff@gmail.com (sappelhoff)Sun, 04 Aug 2019 08:17:42 +0000https://blogs.python-gsoc.org/en/sappelhoffs-blog/tenth-week-of-gsoc-git-annex-and-datalad/Ninth week of GSoChttps://blogs.python-gsoc.org/en/sappelhoffs-blog/ninth-week-of-gsoc/<div class="lead"> <p class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-73 cms-render-model"><strong>1. What did you do this week?</strong></p> I have compiled a list for week 9 in my changelog here: <a href="https://github.com/sappelhoff/gsoc2019/blob/master/changelog.md#week-9">https://github.com/sappelhoff/gsoc2019/blob/master/changelog.md#week-9</a> <p class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-73 cms-render-model"><strong>2. What is coming up next?</strong></p> Next, I will mostly work on the mne-study template. With Mainak, I discussed that the next step would be to implement a CI test suite. <p class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-73 cms-render-model"><strong>3. Did you get stuck anywhere?</strong></p> <p>In the MNE-Python codebase there was a "magical" factor of 85 multiplied with a variable, and it was not documented where that was coming from. It took me a while to figure out (and verify!) that this is the average head radius (assuming an unrealistically spherical head) in millimeters. Now the documentation is much better, but it helped me to learn once more that one has to either<br>  </p> <ul> <li>write clean code <ul> <li>e.g., instead of having the factor 85 there, make it a variable with the name `realistic_head_radius_mm` (or something like it)</li> </ul> </li> <li>write a good documentation <ul> <li>E.g., make short but instructive comments, or more exhaustive documentation in the function or module docstrings</li> </ul> </li> </ul> <p> </p> <p>probably a combination of both is best.</p> </div>Stefan.Appelhoff@gmail.com (sappelhoff)Mon, 29 Jul 2019 08:16:00 +0000https://blogs.python-gsoc.org/en/sappelhoffs-blog/ninth-week-of-gsoc/Eighth week of GSoC: Mixed tasks and progresshttps://blogs.python-gsoc.org/en/sappelhoffs-blog/eighth-week-of-gsoc-mixed-tasks-and-progress/<p>Two thirds of the GSoC program are already over - time is passing very quickly. This past week, we made some progress with the mne-study-template and making it usable with BIDS formatted data.</p> <p>Alex has improved the flow substantially, with Mainak serving as the "Continuous Integration service", regularly running different datasets to the pipeline and reporting where they get stuck. My own tasks were very diverse this week:</p> <p><strong>MNE-BIDS maintenance</strong></p> <p>I fixed several bugs with MNE-BIDS that we found while working on the study template. For example:</p> <ul> <li><a href="https://github.com/mne-tools/mne-bids/pull/227">made `write_anat` and `get_head_mri_trans` more robust</a></li> <li><a href="https://github.com/mne-tools/mne-bids/pull/234">fixed handling of NA data in mne-bids</a></li> </ul> <p><strong>Reviewing and user support</strong></p> <p>Furthermore, I was very happy to see many issues raised on MNE-BIDS. The issues showed that more and more people are picking up MNE-BIDS and using it in their data analysis pipelines. However, that also meant that in the last week, I did more user support and reviewing of pull requests than usual.</p> <p>For example, <a href="https://github.com/mne-tools/mne-bids/pull/233">a nice pull request</a> that I reviewed was done by Marijn (who is also an MNE-Python contributor). He improved MNE-BIDS' find_matching_sidecar function by introducing a "race for the best candidate" of the matching sidecar file.</p> <p><strong>Work on mne-study-template</strong></p> <p>Finally, I also worked on the mne-study-template myself - however, my contributions were rather modest. I mostly cleaned up the configuration files, formatted testing data, and made workflows work where they got stuck.</p> <p>See for example <a href="https://github.com/mne-tools/mne-study-template/pull/35/commits/6f8502f9ce6980a6c80fc7d126b5f05fe4621fba">here</a>.</p> <p><br> <br> Next week, I want to work more on the <strong>mne-study-template</strong>.</p>Stefan.Appelhoff@gmail.com (sappelhoff)Sun, 21 Jul 2019 09:55:33 +0000https://blogs.python-gsoc.org/en/sappelhoffs-blog/eighth-week-of-gsoc-mixed-tasks-and-progress/Seventh week of GSoC: Just a status reporthttps://blogs.python-gsoc.org/en/sappelhoffs-blog/seventh-week-of-gsoc-just-a-status-report/<p class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-73 cms-render-model"><strong>1. What did you do this week?</strong></p> <p>Main work:</p> <ul> <li>I improved MNE-BIDS's "read_raw_bids" function in <a href="https://github.com/mne-tools/mne-bids/pull/219">PR #219</a> by allowing it to automatically set channel types in a "raw" object by parsing an accompanying BIDS channels.tsv file</li> <li>I worked on the MNE-STUDY-TEMPLATE, <a href="https://gitter.im/mne-tools/mne-gsoc-2019-BIDS?at=5d2a03143596f56f8cd087e2">making it work for the first step "loading and filtering"</a> for a different dataset and modality than it was intended for</li> <li>I started a PR (#221) to expose the MNE-BIDS "copyfile functions" to the command line interface</li> </ul> <p>Other work:</p> <ul> <li>User support in MNE-BIDS and MNE-Python over the issues</li> <li>Bugfix in MNE-BIDS (<a href="https://github.com/mne-tools/mne-bids/pull/217">PR #217</a>)</li> <li>Made dev docs accessible for MNE-BIDS via the CircleCI API (<a href="https://github.com/mne-tools/mne-bids/pull/216">PR #216</a>)</li> <li>... and lots of other stuff. As usual, I keep my log on my <a href="https://github.com/sappelhoff/gsoc2019/blob/master/changelog.md">GSoC repository</a></li> </ul> <p class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-73 cms-render-model"><strong>2. What is coming up next?</strong></p> <p>I will ...</p> <ul> <li>finish the PR on improving the MNE-BIDS command line (including docs)</li> <li>Go back to the MNE-STUDY-TEMPLATE and try to make it work for the EEG data beyond the filtering</li> <li>Make a dedicated example to MNE-BIDS' read_raw_bids function</li> <li>allow read_raw_bids to read the digitization files accompanying a raw data file</li> </ul> <p class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-73 cms-render-model"><strong>3. Did you get stuck anywhere?</strong></p> <p class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-73 cms-render-model">As usual, there were many minor points where I got stuck. And as usual, I got lots of support from my mentoring team. This week, there was nothing serious however :-)<br> <br> Some examples:</p> <ul> <li class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-73 cms-render-model">Trying to kill the warnings MNE-BIDS currently throws when running the tests</li> <li class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-73 cms-render-model">Fighting with Freesurfer and the MNE bindings to Freesurfer</li> </ul>Stefan.Appelhoff@gmail.com (sappelhoff)Sun, 14 Jul 2019 08:13:55 +0000https://blogs.python-gsoc.org/en/sappelhoffs-blog/seventh-week-of-gsoc-just-a-status-report/Sixth week of GSoC: On taking breaks and abstaining from rewriting old codehttps://blogs.python-gsoc.org/en/sappelhoffs-blog/sixth-week-of-gsoc-on-taking-breaks-and-abstaining-from-rewriting-old-code/<p>This past week I finally finished the big <a href="https://github.com/mne-tools/mne-bids/pull/211">Pull Request on coordinate systems and writing T1 MRI data for BIDS</a>. Right after the PR was done, I felt like taking a break and not immediately start coding again. Usually, that works quite well in my life as a PhD student: There is a large diversity of non-coding tasks going from reading, over writing, to simply recording data (very practical work). With currently being a student in the Google Summer of Code, I perceive much less diversity of tasks: There are lots of features to be implemented ... and as soon as one feature gets done, the next one should be tackled.</p> <ul> <li>How do other software developers take breaks (or when)?</li> <li>Do they even feel like taking a break after finishing a certain feature?</li> <li>Or is this an issue too individual to be generally answered?</li> <li>... or could it be that my perception of the different coding tasks was a bit too coarse last week and that "implementing features" is more diverse than it sounds?</li> </ul> <p>Anyhow, I overcame my short period of lower motivation and started to work again on the mne-study-template, just to face the next challenge: <strong>Rather than iteratively improve the codebase, I felt the urge to completely rewrite it from scratch</strong>.</p> <p>Some background: I did not design or implement the codebase so far, so everything is rather new to me. I quickly realized that the study template is very biased towards specific types of data to be processed ... and also very biased towards a specific structure that the data should be set up in. With my job to make the mne-study-template more general and relying on a data standard rather than arbitrary data structures, a re-write seemed most efficient to me. Fortunately, I remembered this quote by Joel Spolsky from his blog entry <a href="https://www.joelonsoftware.com/2000/04/06/things-you-should-never-do-part-i/">Things you should never do, Part 1</a><em>:</em></p> <p><em>"Programmers are, in their hearts, architects, and the first thing they want to do when they get to a site is to bulldoze the place flat and build something grand."</em></p> <p>The takeaway of the post is, that re-writing an existing codebase from scratch is rarely a good idea.</p> <p>So next week, I'll further dig into the study template and start with iterative improvements..</p>Stefan.Appelhoff@gmail.com (sappelhoff)Sun, 07 Jul 2019 17:03:29 +0000https://blogs.python-gsoc.org/en/sappelhoffs-blog/sixth-week-of-gsoc-on-taking-breaks-and-abstaining-from-rewriting-old-code/Fifth week of GSoC: Coordinate Systems and Transformationshttps://blogs.python-gsoc.org/en/sappelhoffs-blog/fifth-week-of-gsoc-coordinate-systems-and-transformations/<p class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-73 cms-render-model"><strong>1. What did you do this week?</strong></p> <p>This week's primary work related to work I started in my <a href="https://blogs.python-gsoc.org/en/blogs/sappelhoffs-blog/second-week-of-gsoc-description-of-two-exemplary-work-projects/">second week of GSoC</a>. There I was starting to convert MNE-Python's "somato" dataset to the BIDS standard, so that I can use it as an easy test case for improving BIDS-MNE-pipelines.</p> <p>That work <a href="https://github.com/mne-tools/mne-python/pull/6414#pullrequestreview-247897845">was then halted</a>, because we realized that some files could not yet be saved according to the BIDS standard, because the specification does not cover them (as of yet).</p> <p>Thanks to an <a href="https://github.com/mne-tools/mne-bids/issues/210">idea by Alex</a>, this week was dedicated to implement code that can quickly recalculate all files, without having to save them --> thus, we achive full BIDS compatibility.</p> <p>Let's have a concrete summary in bullet points:</p> <ul> <li>When handling MEG data, we are often dealing with three different coordinate systems <ul> <li>One system to specify the head of the study participant</li> <li>One system to specify the sensors of the MEG machine</li> <li>One system that specifies the MRI scan of the study participants head</li> </ul> </li> <li>For source-space analyses, we need to align these coordinate system in a process called coregistration</li> <li>The coregistration is achieved through transformation matrices, which specify how points have to be rotated and translated to fit from one system into the other</li> <li>In MNE-Python, these transformation matrices are called `trans` ... and currently, there is no fixed way in BIDS, how/where to save these as files</li> <li>In BIDS however, we DO know how to save anatomical landmarks such as the Nasion, and left and right preauricular points</li> <li>Thus, we simply save all of the points in their respective coordinate systems, and then call a function that calculates the `trans` by fitting the points to each other</li> </ul> <p>Sounds more straight forward than it turned out to be, and I spent nearly the whole week wrapping my head around concepts, private functions, <a href="https://github.com/mne-tools/mne-python/pull/6494">improving docs</a>, and <a href="https://github.com/mne-tools/mne-bids/pull/211">implementing the necessary code</a>.</p> <p>It's still not completely finished, by we are getting close.</p> <p class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-73 cms-render-model"><strong>2. What is coming up next?</strong></p> <p class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-73 cms-render-model">Next week, I hope to finalize the work on coordinate systems and transformations. Then I will finally start to make a BIDS version of the mne-study template. Perhaps starting with the first steps, instead of tackling all steps (including source localization) at the same time.</p> <p class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-73 cms-render-model">I realized that currently the mne-study-template is quite MEG centric. I will see whether that will make me run into problems.</p> <p class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-73 cms-render-model"><strong>3. Did you get stuck anywhere?</strong></p> <p class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-73 cms-render-model">I got stuck at several points while working out coregistration, as is documented in several posts I made:</p> <ul> <li class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-73 cms-render-model"><a href="https://gitter.im/mne-tools/mne-gsoc-2019-BIDS?at=5d14e62f6e07c2047072c4af">gitter chat conversation</a></li> <li class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-73 cms-render-model"><a href="https://github.com/mne-tools/mne-bids/issues/210#issuecomment-506403864">big github issue comment</a></li> </ul> <p class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-73 cms-render-model">Luckly, I received lots of helpful comments by Mainak, Eric, and Alex ... so I made some progress regardless of the challenges. :-)</p>Stefan.Appelhoff@gmail.com (sappelhoff)Sun, 30 Jun 2019 17:18:25 +0000https://blogs.python-gsoc.org/en/sappelhoffs-blog/fifth-week-of-gsoc-coordinate-systems-and-transformations/Fourth week of GSoC: Second half of "short GSoC pause" (due to summer school), and getting back to work - Issues with the BIDS-validatorhttps://blogs.python-gsoc.org/en/sappelhoffs-blog/fourth-week-of-gsoc-second-half-of-short-gsoc-pause-due-to-summer-school-and-getting-back-to-work-issues-with-the-bids-validator/<p>The past week I finished the summer school that I was running from the 11th of June until the 19th of June. It was a success given the reactions from all participants<sup>1</sup>. However I was very happy when I could return to my GSoC project and coding starting last Thursday.</p> <p>I started by adding a <a href="https://github.com/mne-tools/mne-bids/pull/209">feature to MNE-BIDS</a>: When reading a raw file into a python object, we want to automatically scan an accompanying meta data file (channels.tsv) to populate the python object with information about bad channels in the data. When implementing the feature, most of the problems I encountered were due to an interaction with the <a href="https://github.com/bids-standard/bids-validator">BIDS-validator</a>, which I want to dedicate today's blog post to.</p> <p><strong>The BIDS-validator</strong></p> <p>I have written about BIDS before: It's a standard for organizing neuroimaging data. A standard can only be a standard when there is a set of <em>testable</em> rules to follow. The BIDS-validator is a software that automatically checks a dataset for its compliance with the BIDS set of testable rules. The current BIDS-validator is written in JavaScript, which offers a unique advantage: It can be run <em>locally</em> inside a browser (see <a href="https://bids-standard.github.io/bids-validator/">here</a>), that is: No files are uploaded. This way, users of BIDS can employ the BIDS-validator without having to download software. Yet, for users with some programming experience, the BIDS-validator can also be downloaded as a command line tool to run on nodejs.</p> <p>Alas, the big advantage of having the BIDS-validator instantiated in Javascript also comes at a cost: The programming language itself. With BIDS being a standard for scientific data, most of the user base consists of researchers. Only a fraction of researchers in the field of neuroscience is well versed in Javascript, with the lingua francas of the field being Matlab and Python (and according to my experiences increasingly Python and less and less Matlab). This means, that open source contributions from the researchers to the BIDS-validator are limited and the BIDS-validator development relies on a small core of contributors and to some extend on contributions from a commercial company, funded through grants given to BIDS. The resulting problem is that the BIDS-validator often lags behind the development of the standard ... or that not all rules are tested to an appropriate extend.</p> <p>Some rules of BIDS are implemented in the form of regular expressions (see <a href="https://github.com/bids-standard/bids-validator/tree/master/bids-validator/bids_validator/rules">here</a>), and are thus"programming language agnostic". This is a great starting point for BIDS-validators implemented in other languages, and there is some <a href="https://pypi.org/project/bids-validator/">limited Python support</a>.</p> <p>Thinking about this, I often find myself going down the road of writing a complete BIDS-validator in Python. The advantage would be obvious: Much easier development! And the expense that it wouldn't be as easily available from the Browser as the current Javascript implementation would be negligible for anyone who can install a python package. Yet, as soon as we have more than one validator, we need to ensure that they produce exactly the same results ... and that could lead to another set of problems very soon.</p> <p>It seems there is no easy way out of this situation. For me, that means that while I develop features with Python during my GSoC, I will have to occasionally spend disproportionate amounts of time debugging and enhacing the BIDS-validator with its codebase in Javascript.</p> <p> </p> <hr> <p><sup>1</sup>Although Eric from MNE-Python told me that <a href="https://gitter.im/mne-tools/mne-gsoc-2019-BIDS?at=5d0bc58a1e35ef14b686bba6">"Success is measured by how many people you convinced to use and contribute to MNE-Python :)"</a></p>Stefan.Appelhoff@gmail.com (sappelhoff)Mon, 24 Jun 2019 08:22:21 +0000https://blogs.python-gsoc.org/en/sappelhoffs-blog/fourth-week-of-gsoc-second-half-of-short-gsoc-pause-due-to-summer-school-and-getting-back-to-work-issues-with-the-bids-validator/Third week of GSoC: Conference discussions and short GSoC pause due to summer schoolhttps://blogs.python-gsoc.org/en/sappelhoffs-blog/third-week-of-gsoc-conference-discussions-and-short-gsoc-pause-due-to-summer-school/<div class="lead"> <p class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-73 cms-render-model"><strong>1. What did you do this week?</strong></p> <p class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-73 cms-render-model">The beginning of this week (including Saturday and Sunday of the previous week) I was in Rome at the <a href="https://www.humanbrainmapping.org/i4a/pages/index.cfm?pageID=3882&amp;activateFull=true">2019 annual meeting</a> of the Organization for Human Brain Mapping.</p> <p class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-73 cms-render-model">This was an exciting event, were I met multiple colleagues working on the two things that connect in my GSoC: <a href="https://bids.neuroimaging.io/">BIDS</a> and <a href="http://github.com/mne-tools/mne-python/">MNE-Python</a>. I had the chance to discuss both BIDS, and the mne-study-template:</p> <p class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-73 cms-render-model">Regarding BIDS, we discussed points like:</p> <ul> <li class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-73 cms-render-model">The current governance model</li> <li class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-73 cms-render-model">Community involvement</li> <li class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-73 cms-render-model">How to progress with important topics like derivative data ... both for MEEG and MRI modalities</li> </ul> <p class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-73 cms-render-model">For the mne-study-template, we discussed that this should become the main focus of my GSoC: Making the pipeline more robust and easy to use with all that we can draw on from BIDS.</p> <p class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-73 cms-render-model"><strong>2. What is coming up next?</strong></p> <ul> <li>BIDS conversion of the <a href="https://datashare.is.ed.ac.uk/handle/10283/2189?show=full">LIMO data</a> in coorporation with <a href="https://blogs.python-gsoc.org/en/blogs/josealaniss-blog/">Jose Alanis</a> (<a href="https://github.com/josealanis">@josealanis</a>)</li> <li>tackling an issue in mne-bids:  <a href="https://github.com/mne-tools/mne-bids/issues/182">https://github.com/mne-tools/mne-bids/issues/182</a></li> <li>think about on where <a href="https://github.com/mne-tools/mne-bids">mne-bids</a> can be imported in the <a href="https://github.com/mne-tools/mne-study-template">mne-study-template</a> to simplify the workflow and to make it more robust at the same time</li> </ul> <p class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-73 cms-render-model"><strong>3. Did you get stuck anywhere?</strong></p> <p class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-73 cms-render-model">The third week of my GSoC went as planned, however I want to notice that most of the days I had to organize/lead a <a href="https://www.mpib-berlin.mpg.de/en/research/adaptive-rationality/summer-institute-on-bounded-rationality">summer school at my institute</a>. This has been part of my <a href="https://blogs.python-gsoc.org/media/proposals/appelhoff_gsoc2019.pdf">original project plan</a> from the beginning (see week 3 in the schedule), so it did not come as a surprise to anyone. Still, I will be very happy to go back to coding once the summer school event is concluded!</p> </div>Stefan.Appelhoff@gmail.com (sappelhoff)Sun, 16 Jun 2019 07:33:22 +0000https://blogs.python-gsoc.org/en/sappelhoffs-blog/third-week-of-gsoc-conference-discussions-and-short-gsoc-pause-due-to-summer-school/Second week of GSoC: Description of two exemplary work projectshttps://blogs.python-gsoc.org/en/sappelhoffs-blog/second-week-of-gsoc-description-of-two-exemplary-work-projects/<p>In today's blog post I will describe two example projects that I have been working on during the last week. finally I will describe how these two examples relate to my overall goal in this GSoC.</p> <p><strong>Conversion of MNE-somato-data</strong></p> <p>This week I spent some time converting a dataset to comply with the <a href="http://bids.neuroimaging.io">Brain Imaging Data Structure</a>: The <a href="https://martinos.org/mne/stable/manual/datasets_index.html?#somatosensory">MNE-somato-data</a>, which is used in several code examples and tutorials for the MNE-Python documentation.</p> <p>The Brain Imaging Data Structure is an emerging standard on how to organize and structure neuroimaging data recordings such as MRI, EEG, MEG, or iEEG data. Such a standard is invaluable to improve the sharing of data, performing quality analysis, and building automated pipelines.</p> <p>Converting existing datasets to this new standard allows us to reap all of these benefits and build on them in the future.</p> <p>However, the conversion is often not very straight forward. In the particular case of the MNE-somato-data I was facing a severe lack of documentation. Thus, the conversion from an arbitrary data structure to the standard of BIDS was slower than expected, yet now the somato dataset has a much better documentation on top of being organized in a sensible standard.</p> <p><br> <strong>Autoreject documentation</strong></p> <p>The <a href="https://github.com/autoreject/autoreject">autoreject package</a> is Python software to "clean" electrophysiology data such as EEG and MEG. It uses a process of crossvalidation to automatically find thresholds that can be used to reject or retain parts of the data. In addition, there is an algorithm to repair data data that might be rejected otherwise (because of exceeding the crossvalidated threshold).</p> <p>When using a software package such as autoreject, the documentation of the inner workings are almost as important as the functionality of the software itself: Especially when it comes to the analysis of scientific data by researchers, who are often not trained to go through sourcecode and understand the inner workings themselves.</p> <p>The autoreject package has some documentation in the form of examples that show off the basic functionality. On top, there is a small FAQ section that addresses user needs beyond getting information about basic functionality.</p> <p>This week, I added a section on the general understanding of the algorithm, not directly related to code. Providing this intuitive explanation up front can be used to approach the more mathematical explanations to be found in the associated <a href="https://www.sciencedirect.com/science/article/pii/S1053811917305013">scientific publication</a>.</p> <p>Throughout this process, I have tried to follow the guidelines on "<a href="https://www.divio.com/blog/documentation/">good documentations</a>" that are always split into 4 parts: "Tutorials", "How-to guides", "Explanation", and "Reference"</p> <p><img alt="good documentation picture" height="339" src="https://i.stack.imgur.com/9uc2M.png" width="620"><br>  </p> <p> </p> <p><strong>How does this related to my overall project?</strong></p> <p>My <a href="https://blogs.python-gsoc.org/media/proposals/appelhoff_gsoc2019.pdf">overall project goal</a> is to enable or enhance automatic processing of neurophsyiology datasets organized using BIDS. The conversion of the MNE-somato-data to BIDS provides me with a testing case for analyses pipelines. And, as already evident from its name, the autoreject package is a prime candidate for automatic processing of neurophysiology data and it is a good idea to improve the documentation of the software that you want other people to use.</p>Stefan.Appelhoff@gmail.com (sappelhoff)Sun, 09 Jun 2019 06:13:09 +0000https://blogs.python-gsoc.org/en/sappelhoffs-blog/second-week-of-gsoc-description-of-two-exemplary-work-projects/First week of GSoC: Going down several rabbit holeshttps://blogs.python-gsoc.org/en/sappelhoffs-blog/first-week-of-gsoc-going-down-several-rabbit-holes/<p><strong>1. What did you do this week?</strong></p> <p>This week was characterized by many different smaller tasks, such as:</p> <ol> <li>improvements to documentation,</li> <li>fixing of bugs (typo-bugs),</li> <li>speeding up continuous integration through caching,</li> <li>opening issues to discuss potential APIs for analysis pipelines,</li> <li>... and some more</li> </ol> <p><br> To track my progress in GSoC, I have made a repository: <a href="https://github.com/sappelhoff/gsoc2019">github.com/sappelhoff/gsoc2019</a> where I host a <a href="https://github.com/sappelhoff/gsoc2019/blob/master/changelog.md">changelog file</a> that contains each issue/pr/task that I have worked on, divided by weeks and days.<br> <br> For my overall project, the most important work was probably opening an <a href="https://github.com/sappelhoff/gsoc2019/issues/1">issue </a>to discuss potential APIs for analysis pipelines. I suggested a JSON-file centered approach, and Mainak (my mentor) pointed me to several existing solutions. In our chat on <a href="https://gitter.im/mne-tools/mne-gsoc-2019-BIDS">Gitter</a> we later agreed to target the <a href="https://github.com/mne-tools/mne-study-template">mne-study-template</a> and improving it, before making an attempt to program a new pipeline from scratch.</p> <p><br> <strong>2. What is coming up next?</strong></p> <ul> <li>Conversion of MNE testing datasets to the <a href="http://bids.neuroimaging.io/">BIDS </a>standard to have a good set of data to test pipelines on</li> <li>Applying code from <a href="http://github.com/mne-tools/mne-bids/">MNE-BIDS</a> to the mne-study-template to see where we can improve it</li> <li>Finishing an addition to the documentation of the autoreject package: <a href="https://github.com/autoreject/autoreject/issues/144">https://github.com/autoreject/autoreject/issues/144</a></li> </ul> <p>Next week I will also travel to Rome for the <a href="https://www.humanbrainmapping.org/i4a/pages/index.cfm?pageid=3882&amp;pageid=3900">OHBM conference</a>, where I will meet Mainak and Alex who are mentoring me during this GSoC.</p> <p><br> <strong>3. Did you get stuck anywhere?</strong></p> <p>I do not feel like I "got stuck" with anything in particular, but neither did I make a good "first step" with my project. As indicated in the title of this post, I always started to do something, and then got sidetracked by minor issues that I first wanted to fix. This ended up being a big time investment for each fix. It feels good to fix minor issues, but it should not distract me from the overall goal of the GSoC :-)</p>Stefan.Appelhoff@gmail.com (sappelhoff)Sun, 02 Jun 2019 14:36:13 +0000https://blogs.python-gsoc.org/en/sappelhoffs-blog/first-week-of-gsoc-going-down-several-rabbit-holes/Google Summer of Code (GSoC) 2019: Analysis Pipelines and BIDShttps://blogs.python-gsoc.org/en/sappelhoffs-blog/google-summer-of-code-gsoc-2019-analysis-pipelines-and-bids/<pre>Dear Python and GSoC Community, this year I will take part in the Google Summer of Code (GSoC) to dedicate three months of coding towards improving neuro-data analysis with <a href="https://martinos.org/mne/stable/index.html">MNE-Python</a>. I am a PhD student in my second year, mostly working with EEG data in the domain of human decision making. In my free time I contribute to open source software and more recently, I have become a maintainer for the "<a href="https://bids.neuroimaging.io/">Brain Imaging Data Structure</a>" (BIDS), an emerging standard for organizing neuroimaging data. In my GSoC project with MNE-Python, I will be drawing on "Brain Imaging Data Structure"  to build automated, standardized analysis pipelines. Stay tuned! :-) If you want to get in touch, feel free to reach out via <a href="https://gitter.im/mne-tools/mne-gsoc-2019-BIDS">Gitter</a> or Github (<a href="https://github.com/sappelhoff/">@sappelhoff</a>). Cheers, Stefan</pre>Stefan.Appelhoff@gmail.com (sappelhoff)Tue, 28 May 2019 19:58:18 +0000https://blogs.python-gsoc.org/en/sappelhoffs-blog/google-summer-of-code-gsoc-2019-analysis-pipelines-and-bids/