Eleventh week of GSoC: Some more Datalad (complete and automatic flow now)
1. What did you do this week?
I have compiled a list for week 11 in my changelog here: https://github.com/sappelhoff/gsoc2019/blob/master/changelog.md#week-11
2. What is coming up next?
Next, I will continue to improve the mne-study-template and also work on a new release of MNE-BIDS.
3. Did you get stuck anywhere?
As the week before, I got stuck a bit with Datalad. However, I finally fixed all problems and I want to report the flow of my pipeline below. enjoy!
Pipeline to get any dataset as git-annex dataset
using the following tools:
- OSF: https://osf.io
- osfclient: https://github.com/osfclient/osfclient
- git-annex: https://git-annex.branchable.com/
- datalad: https://www.datalad.org/
- datalad-osf: https://github.com/templateflow/datalad-osf/
- Github: https://github.com
- Step 1 Upload data to OSF
- install osfclient: `pip install osfclient` (see https://github.com/osfclient/osfclient)
- make a new OSF repository from the website (need to be registered)
- copy the "key" from the new OSF repository, e.g., "3qmer" for the URL: "https://osf.io/3qmer/"
- navigate to the directory that contains the directory you want to upload to OSF
- make a `.osfcli.config` file: `osf init` ... this file gets written into the current working directory
- call `osf upload -r MY_DATA/ .` to upload your data, replacing MY_DATA with your upload directory name
- instead of being prompted to input your password, you can define an environment variable OSF_PASSWORD with your password. This has the advantage that you could start an independent process without having to wait and leave your command line prompt open: `nohup osf upload -r MY_DATA/ . &`
- NOTE: Recursive uploading using osfclient can be a bad experience. Check out this wrapper script for more control over the process: https://github.com/sappelhoff/gsoc2019/blob/master/misc_code/osfclient_wrapper.py
- Step 2 Make a git-annex dataset out of the OSF data
- install datalad-osf: git clone and use `pip install -e .` NOTE: You will need the patch submitted here: https://github.com/templateflow/datalad-osf/pull/2
- install datalad: `pip install datalad` and git-annex (e.g., via conda-forge)
- create your data repository: `datalad create MY_DATA`
- go there and download your OSF data using datalad-osf: `cd MY_DATA` ... then `python -c "import datalad_osf; datalad_osf.update_recursive(key='MY_KEY')"`, where MY_KEY is the "key" from step 1 above.
- Step 3 Publish the git-annex dataset on GitHub
- Make a fresh (empty) repository on GitHub: <repo_url>
- Clone your datalad repo: datalad install -s <local_repo> clone
- cd clone
- git annex dead origin
- git remote rm origin
- git remote add origin <repo_url>
- datalad publish --to origin
- Step 4 Get parts of your data (or everything) from the git-annex repository
- datalad install <repo_url>
- cd <repo>
- datalad get <some_folder_or_file_path>
- datalad get .
Important sources / references