Eleventh week of GSoC: Some more Datalad (complete and automatic flow now)

sappelhoff
Published: 08/11/2019

1. What did you do this week?

I have compiled a list for week 11 in my changelog here: https://github.com/sappelhoff/gsoc2019/blob/master/changelog.md#week-11

2. What is coming up next?

Next, I will continue to improve the mne-study-template and also work on a new release of MNE-BIDS.

3. Did you get stuck anywhere?

As the week before, I got stuck a bit with Datalad. However, I finally fixed all problems and I want to report the flow of my pipeline below. enjoy!

Pipeline to get any dataset as git-annex dataset

using the following tools:

OSF: https://osf.io
osfclient: https://github.com/osfclient/osfclient
git-annex: https://git-annex.branchable.com/
datalad: https://www.datalad.org/
datalad-osf: https://github.com/templateflow/datalad-osf/
Github: https://github.com

Step 1 Upload data to OSF
1. install osfclient: `pip install osfclient` (see https://github.com/osfclient/osfclient)
2. make a new OSF repository from the website (need to be registered)
3. copy the "key" from the new OSF repository, e.g., "3qmer" for the URL: "https://osf.io/3qmer/"
4. navigate to the directory that contains the directory you want to upload to OSF
5. make a `.osfcli.config` file: `osf init` ... this file gets written into the current working directory
6. call `osf upload -r MY_DATA/ .` to upload your data, replacing MY_DATA with your upload directory name
7. instead of being prompted to input your password, you can define an environment variable OSF_PASSWORD with your password. This has the advantage that you could start an independent process without having to wait and leave your command line prompt open: `nohup osf upload -r MY_DATA/ . &`
8. NOTE: Recursive uploading using osfclient can be a bad experience. Check out this wrapper script for more control over the process: https://github.com/sappelhoff/gsoc2019/blob/master/misc_code/osfclient_wrapper.py
Step 2 Make a git-annex dataset out of the OSF data
1. install datalad-osf: git clone and use `pip install -e .` NOTE: You will need the patch submitted here: https://github.com/templateflow/datalad-osf/pull/2
2. install datalad: `pip install datalad` and git-annex (e.g., via conda-forge)
3. create your data repository: `datalad create MY_DATA`
4. go there and download your OSF data using datalad-osf: `cd MY_DATA` ... then `python -c "import datalad_osf; datalad_osf.update_recursive(key='MY_KEY')"`, where MY_KEY is the "key" from step 1 above.
Step 3 Publish the git-annex dataset on GitHub
1. Make a fresh (empty) repository on GitHub: <repo_url>
2. Clone your datalad repo: datalad install -s <local_repo> clone
3. cd clone
4. git annex dead origin
5. git remote rm origin
6. git remote add origin <repo_url>
7. datalad publish --to origin
Step 4 Get parts of your data (or everything) from the git-annex repository
1. datalad install <repo_url>
2. cd <repo>
3. datalad get <some_folder_or_file_path>
4. datalad get .

Important sources / references

Eleventh week of GSoC: Some more Datalad (complete and automatic flow now)

Versions

Time

Settings from gsoc.settings

Headers

Request

SQL queries from 1 connection

Static files (2312 found, 3 used)

Templates (11 rendered)

Cache calls from 1 backend

Signals

Log messages