NeuroDataPub: NCCR-SYNAPSY Neuroimaging Dataset Publishing Tool

Latest released version: v0.4

This tool is developed by the Connectomics Lab at the University Hospital of Lausanne (CHUV) for use within the lab and within the National Centre of Competence in Research (NCCR) “SYNAPSY – Synaptic Bases of Mental Diseases” NCCR-SYNAPSY, as well as for open-source software distribution.

PyPI Digital Object Identifier Documentation Status Circle CI Code quality

Introduction

NeuroDataPub is a neuroimaging dataset publishing tool written in Python and built on top of Datalad and git-annex, that lowers the barriers, for the NCCR-SYNAPSY members, to adopt Datalad to manage and publish, privately or publicly, their dataset repository on GitHub and the annexed files on their SSH data server, in order to fully fulfill the implemented Neuroimaging Data Management Plan.

Since v0.3, you can use either (1) a server accessible via ssh or (2) the Open Science Foundation (OSF) platform, as a git-annex special remote, to host your annexed files.

Since v0.4, NeuroDataPub can handle datasets that do and do not follow the Brain Imaging Data Structure standard.

_images/neurodatapub_illustration.png

NeuroDataPub comes with its graphical user interface, aka the NeuroDataPub Assistant, created to facilitate:

  • the configuration of the siblings,

  • the creation of the JSON configuration files, as well as

  • the execution of NeuroDataPub in the three different modes, and

  • the creation of a Linux shell script for later execution where all commands are recorded.

Acknowledgment

If your are using NeuroDataPub in your work, please acknowledge this software and its dependencies. See Citing for more details.

License information

This software is distributed under the open-source Apache 2.0 license. See license for more details.

All trademarks referenced herein are property of their respective holders.

Help/Questions

If you run into any problems or have any code bugs or questions, please create a new GitHub Issue.

Eager to contribute?

See Contributing for more details.

Funding

Supported by the National Centre of Competence in Research (NCCR) “SYNAPSY – Synaptic Bases of Mental Diseases” NCCR-SYNAPSY (NCCR-SYNAPSY website / NCCR-SYNAPSY Swiss National Science Foundation Page) under SNF-185897 grant.

Contents

Installation Instructions

Prerequisites

Installation of the NeuroDataPub has been facilitated through its distribution to the Python Package Index and the use of a conda Python 3.8 environment to install its dependencies, so in order to run NeuroDataPub, we would need to install Miniconda 3 (Instructions in Installation of Miniconda 3).

Once Miniconda 3 is installed, the recommended way to run NeuroDataPub is to use the NeuroDataPub Assistant. Usage instructions can be found in NeuroDataPub Assistant Guide. However, if you are not afraid by the creation of JSON files, or you feel more comfortable with the command-line interface, usage instructions for the neurodatapub command-line interface can be found in Running neurodatapub.

Installation of Miniconda 3
  • Download the installer of Miniconda 3 corresponding to your 32/64bits MacOSX/Linux/Win system from https://conda.io/miniconda.html. This can alternatively be done in the terminal:

    $ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-<YOUR-OS>-x86_64.sh
    
  • Execute the downloaded script to install it:

    $ bash Miniconda3-latest-<YOUR-OS>-x86_64.sh
    

Note

NeuroDataPub has been tested only on Ubuntu and MacOSX.

Creation of neurodatapub-env conda environment
  • Download the appropriate conda environment.yml or environment_macosx.yml depending on your OS.

    Important

    It seems there is no conda package for git-annex available on Mac. For convenience, we created an additional environment_macosx.yml miniconda3 environment where the line - git-annex=[...] has been removed. Git-annex should be installed on MacOSX using brew i.e. brew install git-annex. See https://git-annex.branchable.com/install/ for more details.

  • Create the neurodatapub-env conda environment:

    $ conda env create -f /path/to/downloaded/environment[_macosx].yml
    

    This will create a Python 3.8 environment with all dependencies installed.

Installation of NeuroDataPub

Once the neurodatapub-env conda environment, NeuroDataPub can be installed in the neurodatapub-env conda environment via pip:

$ pip install neurodatapub
  • You are almost ready to use NeuroDataPub! You would need to have at least git-annex installed on the remote data server. Please see Remote Data Server Setup for instructions.

Help/Questions

Code bugs can be reported by creating a new GitHub Issue.

Remote Data Server Setup

In this section, you will see how to setup your special remote data server to store Datalad-managed datasets. As one normal user usually does not have root/admin privileges to the server, this prevents her/him to install them via apt-get.

In this case, there exist two user-based installation solutions, depending on the accessibility of your remote data server to internet:

  1. Internet access: Installation with Conda

  2. No internet access: Installation of standalone git-annex

Installation with Conda

From solutions of the DataLad handbook, installation with Conda is the most convenient user-based installation but it requires an internet access from the remote data server.

In this situation, with Conda or Miniconda installed (If not, please check Installation of Miniconda 3 for instructions), the DataLad package can be installed from the conda-forge channel as follows:

$ conda install -c conda-forge datalad

In general, all software dependencies of DataLad (including git-annex) are automatically installed too.

Note

This approach has the advantage that any dataset could be then directly managed on the remote server with Datalad.

Installation of standalone git-annex

The remote data server might not be connected to internet for security reasons and so, it would be impossible to install DataLad via conda or pip. But do not worry! One can still use a Linux standalone distribution of git-annex. It consists of the following steps:

  1. Download from the official website the Linux standalone for git-annex: git-annex-standalone-amd64.tar.gz.

  2. Create a folder called for instance Softwares in your /home directory with the mkdir command via ssh:

    $ ssh user@stockage.server.ch \
    "mkdir -p /home/user/Softwares"
    

    Important

    The command mkdir -p /home/user/Softwares MUST be put inside "" in order to pass and execute this command via ssh.

  3. Copy the downloaded archive to the created folder on the remote server. This can be achieved with the scp command:

    $ scp /local/path/to/git-annex-standalone-amd64.tar.gz \
    user@stockage.server.ch:/home/user/Softwares/git-annex-standalone-amd64.tar.gz
    
  4. Extract the content of the archive to a folder git-annex-standalone with the tar command and remove it via ssh:

    $ ssh user@stockage.server.ch \
    "tar xzvf /home/user/Softwares/git-annex-standalone-amd64.tar.gz -C git-annex-standalone && rm /home/user/Softwares/git-annex-standalone-amd64.tar.gz"
    

    Important

    The command tar [...] && rm [...] MUST be put inside "" in order to pass and execute this sequence of commands via ssh.

  5. Connect to the remote data server via ssh:

    $ ssh user@stockage.server.ch
    

    Then, open the ~/.bashrc file with vim text editor for instance ($ vim ~/.bashrc) and add the following lines to update system PATH and LD_LIBRARY_PATH:

    export LD_LIBRARY_PATH="/home/user/Softwares/git-annex-standalone/bin:$LD_LIBRARY_PATH"
    export PATH="/home/user/Softwares/git-annex-standalone:$PATH"
    

    This finalizes the installation of the standalone git-annex binaries and libraries.

    Tip

    In vim, the key i goes into edition mode. When you are done, press the key esc and then :wq to tell vim to save your change (w) and quit (q).

Note

In this approach, only git-annex is installed on the remote server and so, it would not be possible to directly manage Datalad datasets with Datalad directly there. If one wants to do so, this would require the installation of the dataset on a host machine where an installation of Datalad is available.

NeuroDataPub Assistant Guide

Important

Before using NeuroDataPub, the remote data server should provide at least an installation of git-annex. Please see Remote Data Server Setup for instructions.

Note also that NeuroDataPub takes as principal input the path of your dataset that should be compliant to the Brain Imaging Data Structure (BIDS) format by default. If you are using a dataset in BIDS format, you should always make sure that your dataset is in valid BIDS format before using NeuroDataPub using the free, online BIDS Validator, or its standalone version. See BIDS standard for more information about BIDS. If it does not make any sense to adopt the BIDS format for your dataset, NeuroDataPub can also handle dataset not necessary in the BIDS format, since v0.4, with the --is_not_bids option.

Introduction

NeuroDataPub comes with a Graphical User Interface aka the NeuroDataPub Assistant to support not only the configuration of the siblings and the generation of the corresponding JSON configuration files, but also its execution in the three different modes.

1. Start the Graphical User Interface

In a terminal, activate the neurodatapub-env conda environment:

$ conda activate neurodatapub-env

Please check Creation of neurodatapub-env conda environment for more details about its creation.

After activation, the NeuroDataPub Assistant can be launched via the neurodatapub command-line interface with the --gui option flag:

$ neurodatapub --gui \
     (--dataset_dir '/local/path/to/input/bids/dataset' \)
     (--datalad_dir  '/local/path/to/output/datalad/dataset' \)
     (--git_annex_ssh_special_sibling_config '/local/path/to/special_annex_sibling_config.json' \)
     (--github_sibling_config '/local/path/to/github_sibling_config.json')
     (--osf_sibling_config '/local/path/to/osf_sibling_config.json')

Note

When you run the neurodatapub command-line interface with the --gui option, it is not required to specify the option flags required for a normal run from the commandline interface. However, if provided, the parameters will be used to initialize the configuration of the project.

2. Configure input and outputs directories

You can select or reconfigure your input dataset directory, its format (BIDS / non-BIDS) and the directory of the Datalad dataset that will be created in the first tab of the NeuroDataPub Assistant.

_images/neurodatapub_main_window.png

3. Configure the siblings

You can configure or reconfigure the settings for the special git-annex and GitHub remote siblings.

_images/neurodatapub_siblings_tab_window.png

3.1 Special remote sibling settings

Since v0.3, you can use either (1) the data storage server of your institution accessible via ssh or (2) the Open Science Foundation (OSF) platform to host your annexed files.

3.1.1 Server accessible via ssh
_images/neurodatapub_siblings_tab_ssh_config.png

  • "remote_ssh_login" (mandatory): user’s login to the remote

  • "remote_ssh_url" (mandatory): SSH-URL of the remote in the form "ssh://..."

  • "remote_sibling_dir" (mandatory): Remote .git/ directory of the sibling dataset

3.1.2 OSF (Cloud)
_images/neurodatapub_siblings_tab_osf_config.png

  • "osf_dataset_title" (mandatory): Dataset title on OSF.

  • "osf_token" (mandatory): user’s OSF authentication token. To make a Personal Access Token, please go to the relevant OSF settings page and create one. If you do not an OSF account yet, you will need to create one a-priori.

3.2 GitHub sibling settings
_images/neurodatapub_siblings_tab_github_config.png

  • "github_login" (mandatory): user’s login to GitHub.

  • "github_email" (mandatory): user’s email associated with GitHub account.

  • "github_organization" (mandatory): GitHub organization the GitHub account has access to.

  • "github_token" (mandatory): user’s github authentication token. Please see “Creating a personal access token” Github documentation for more details on how to get one. Make also sure that the write:org and read:org options are enabled.

  • "github_repo_name" (mandatory): Dataset repository name on GitHub.

3.3 Create the JSON sibling configuration files

Settings for each of the different siblings can be saved in a JSON file by clicking on their respective Save JSON button.

4. Check the configuration and run NeuroDataPub

Before being able to initiate the processes of creation and / or publication of the datalad dataset, you will need to make the NeuroDataPub Assistant checking them out by clicking on the Check config button.

_images/neurodatapub_check_config_button.png

If the configuration is completely valid, this will enable the Create and Publish Dataset, Create Dataset, Publish Dataset buttons.

_images/neurodatapub_exec_buttons_enable.png

Then, you can run NeuroDataPub in one of the three execution modes by clicking on one of the buttons.

_images/neurodatapub_execution.png

Need more control?

Since v0.4, NeuroDataPub can be run in Generate script only mode to give more control to more advanced users familiar with the Linux shell. If enabled, NeuroDataPub will run in a “dryrun” mode and will only create a Linux shell script called neurodatapub_%d-%m-%Y_%H-%M-%S.sh in the code/ directory of your input dataset that records all the underlined commands. If it appears that the code/ folder does not exist yet, it will be automatically created.

_images/neurodatapub_generate_script_execution.png

Note

You can always see the execution progress by checking the standard outputs in the terminal, such as the following:

$ neurodatapub --gui

[...]

############################################
# Check configuration
############################################

    * PyBIDS summary:
    BIDS Layout: ...localuser/Data/ds-sample | Subjects: 1 | Sessions: 1 | Runs: 0
    * remote_ssh_login: user
    * remote_ssh_url: ssh://stockage.server.ch
    * remote_sibling_dir: /home/user/Data/ds-sample/.git
    * github_login: user
    * github_repo_name: ds-sample

Configuration is valid!
############################################

############################################
# Creation of Datalad Dataset
############################################

> Initialize the Datalad dataset /home/localuser/Data/ds-sample/derivative/neurodatapub-v0.1
[INFO   ] Creating a new annex repo at /home/localuser/Data/ds-sample/derivative/neurodatapub-v0.1
[INFO   ] Running procedure cfg_text2git
[INFO   ] == Command start (output follows) =====
[INFO   ] == Command exit (modification check follows) =====
[INFO   ] Running procedure cfg_bids
[INFO   ] == Command start (output follows) =====
[INFO   ] Running procedure cfg_metadatatypes
[INFO   ] == Command start (output follows) =====
[INFO   ] == Command exit (modification check follows) =====
[INFO   ] == Command exit (modification check follows) =====
Dataset(/home/localuser/Data/ds-sample/derivative/neurodatapub-v0.1)

[...]

Support, bugs and new feature requests

All bugs, concerns and enhancement requests for this software are managed on GitHub and can be submitted at https://github.com/NCCR-SYNAPSY/neurodatapub/issues.

Commandline Usage

Important

Before using NeuroDataPub, the remote data server should provide at least an installation of git-annex. Please see Remote Data Server Setup for instructions.

Note also that NeuroDataPub takes as principal input the path of your dataset that should be compliant to the Brain Imaging Data Structure (BIDS) format by default. If you are using a dataset in BIDS format, you should always make sure that your dataset is in valid BIDS format before using NeuroDataPub using the free, online BIDS Validator, or its standalone version. See BIDS standard for more information about BIDS. If it does not make any sense to adopt the BIDS format for your dataset, NeuroDataPub can also handle dataset not necessary in the BIDS format, since v0.4, with the --is_not_bids option.

Commandline Arguments

Command-line argument parser of NeuroDataPub (v0.4)

usage: neurodatapub [-h] --mode {all,create-only,publish-only} --dataset_dir
                    DATASET_DIR [--is_not_bids] --datalad_dir DATALAD_DIR
                    --github_sibling_config GITHUB_SIBLING_CONFIG
                    (--git_annex_ssh_special_sibling_config GIT_ANNEX_SSH_SPECIAL_SIBLING_CONFIG | --osf_sibling_config OSF_SIBLING_CONFIG)
                    [--gui] [--generate_script] [-v]
Named Arguments
--mode

Possible choices: all, create-only, publish-only

Mode in which neurodatapub is run: "create-only" create the datalad dataset only, "publish-only" publish the datalad dataset only, "all" create and publish the datalad dataset.

--dataset_dir

The directory with the input dataset formatted according to the BIDS standard.

--is_not_bids

Specify if the directory with the input dataset is not formatted according to the BIDS standard.

Default: False

--datalad_dir

The local directory where the Datalad dataset should be.

--github_sibling_config

Path to a JSON file containing configuration parameters for the GitHub dataset repository sibling.

--git_annex_ssh_special_sibling_config

Path to a JSON file containing configuration parameters for the git-annex SSH special remote dataset sibling.

--osf_sibling_config

Path to a JSON file containing configuration parameters for the git-annex OSF special remote dataset sibling.

--gui

Run NeuroDataPub in GUI mode.

Default: False

--generate_script

Dry run that generates a bash script called neurodatapub_DD-MM-YYYY_hh:mm:ss.sh in the code/ folder of the input dataset that records all commands for later execution.

Default: False

-v, --version

show program’s version number and exit

Sibling configuration files

Git-annex special remote sibling configuration file

The Git-annex special remote sibling configuration file specified by the input flag --git_annex_ssh_special_sibling_config adopts the following JSON schema:

{
    "remote_ssh_login": "user",
    "remote_ssh_url": "ssh://neurodatapub.server.org",
    "remote_sibling_dir": "/remote/path/of/dataset/sibling/.git"
}
where:
  • "remote_ssh_login" (mandatory): user’s login to the remote

  • "remote_ssh_url" (mandatory): SSH-URL of the remote in the form “ssh://…”

  • "remote_sibling_dir" (mandatory): Remote .git/ directory of the sibling dataset

GitHub sibling configuration file

The GitHub sibling configuration file specified by the input flag --github_sibling_config adopts the following JSON schema:

{
    "github_login": "GitHubUserName",
    "github_email": "GitHubUserEmail",
    "github_organization": "NCCR-SYNAPSY",
    "github_token": "Personal github authentication token",
    "github_repo_name": "DatasetName"
}
where:
  • "github_login" (mandatory): user’s login to GitHub.

  • "github_email" (mandatory): user’s email associated with GitHub account.

  • "github_organization" (mandatory): GitHub organization the GitHub account has access to.

  • "github_token" (mandatory): user’s github authentication token. Please see “Creating a personal access token” Github documentation for more details on how to get one. Make also sure that the write:org and read:org options are enabled.

  • "github_repo_name" (mandatory): Dataset repository name on GitHub.

OSF sibling configuration file

The OSF sibling configuration file specified by the input flag --osf_sibling_config adopts the following JSON schema:

{
    "osf_dataset_title": "DatasetName",
    "osf_token": "Personal OSF authentication token",
}
where:
  • "osf_dataset_title" (mandatory): Dataset title on OSF.

  • "osf_token" (mandatory): user’s OSF authentication token. To make a Personal Access Token, please go to the relevant OSF settings page and create one. If you do not an OSF account yet, you will need to create one a-priori.

Running neurodatapub

The neurodatapub command-line interface can be run in in the “create-only”, “publish-only”, and “all” modes with the --mode option flag (as described in Commandline Arguments). For example, an invocation of the interface to create and publish a dataset (“all” mode) to a ssh sibling would be as follows:

$ neurodatapub --mode "all" \
     --dataset_dir '/local/path/to/input/bids/dataset' \
     --datalad_dir  '/local/path/to/output/datalad/dataset' \
     --git_annex_ssh_special_sibling_config '/local/path/to/special_annex_sibling_config.json' \
     --github_sibling_config '/local/path/to/github_sibling_config.json'

Note

When you use directly the command-line interface, you would need to provide the JSON files with the option flags --github_sibling_config, and --git_annex_ssh_special_sibling_config, or --git_annex_osf_sibling_config to describe the configuration of the GitHub and special remote dataset siblings.

Need more control?

Since v0.4, NeuroDataPub can be run with the --generate_script option to give more control to more advanced users familiar with the Linux shell:

$ neurodatapub --mode "all" \
     --generate_script \
     --dataset_dir '/local/path/to/input/bids/dataset' \
     --datalad_dir  '/local/path/to/output/datalad/dataset' \
     --git_annex_ssh_special_sibling_config '/local/path/to/special_annex_sibling_config.json' \
     --github_sibling_config '/local/path/to/github_sibling_config.json'

Using this option, NeuroDataPub will run in a “dryrun” mode and will only create a Linux shell script, called neurodatapub_%d-%m-%Y_%H-%M-%S.sh in the code/ directory of your input dataset, that records all the underlined commands. If it appears that the code/ folder does not exist yet, it will be automatically created.

Support, bugs and new feature requests

All bugs, concerns and enhancement requests for this software are managed on GitHub and can be submitted at https://github.com/NCCR-SYNAPSY/neurodatapub/issues.

BIDS Standard

By default, a dataset published with NeuroDataPub SHOULD ADOPT the BIDS standard for data organization. This means that NeuroDataPub handle by default only datasets formatted following the BIDS standard. However, since v0.4, NeuroDataPub can handle dataset not necessary in a BIDS format with the --is_not_bids option. See Commandline Usage for more details.

For more information about BIDS, please consult the BIDS Website and the Online BIDS Specifications. HeuDiConv can assist you in converting DICOM brain imaging data to BIDS. A nice tutorial can be found @ BIDS Tutorial Series: HeuDiConv Walkthrough .

Important

Before using NeuroDataPub, we highly recommend you to validate your BIDS structured dataset with the free, online BIDS Validator.

neurodatapub.cli.neurodatapub

neurodatapub.project

neurodatapub.ui.project

neurodatapub.utils

List of Modules

Modules

neurodatapub.utils.io: utils functions for input/output.

neurodatapub.utils.io.copy_content_to_datalad_dataset(bids_dir, datalad_dataset_dir, dryrun=False)[source]

Copy BIDS dataset content to target datalad dataset directory using rsync.

bids_dirstring

Local path of the BIDS dataset

datalad_dataset_dirstring

Local path of the directory of the datalad dataset being created

dryrunbool

If True, only generates the commands and do not execute them (Default: False)

proc :

Output of call to rsync command via subprocess.run()

cmdstring

Equivalent output command

neurodatapub.utils.jsonconfig: utils functions to handle JSON sibling configuration files.

neurodatapub.utils.jsonconfig.validate_json_sibling_config(json_file, sibling_type=None)[source]

Validate a JSON sibling configuration file.

json_filestr

Absolute path to JSON sibling configuration file

sibling_type[‘git-annex-special-sibling’,’github-sibling’, ‘osf-sibling’]

Type of sibling configuration file

neurodatapub.utils.process: utils functions to run command via subprocess.

neurodatapub.utils.process.run(command, env=None, cwd=None)[source]

Function calls to execute a command. It runs the command specified as input via subprocess.run().

commandstring

String containing the command to be executed (required)

envos.environ

Specify a custom os.environ

cwdDirectory

Specify a custom current working directory

>>> cmd = 'ls "/path/to/folder"'
>>> run(cmd) 

neurodatapub.utils.qt: utils functions for Qt style sheets.

neurodatapub.utils.qt.return_folder_button_style_sheet()[source]

Return the Qt style sheet for the traitsui FileEditor and DirectoryEditor.

style_sheet_folder_buttonstr

Qt style sheet

neurodatapub.utils.qt.return_global_style_sheet()[source]

Return the global Qt style sheet of the GUI.

style_sheetstr

Qt style sheet

neurodatapub.utils.qt.return_save_json_button_style_sheet()[source]

Return the Qt style sheet for the button that saves JSON configuration files.

style_sheet_save_json_buttonstr

Qt style sheet

neurodatapub.utils.sshconfig: utils function to edit SSH config.

neurodatapub.utils.sshconfig.update_ssh_config(sshurl, user, dryrun=False)[source]

Add a new entry to the SSH config file (~/.ssh/config).

It sets the default user login to the SSH special remote.

sshurlstr

SSH URL of the git-annex special remote in the form ssh://server.example.org

userstr

User login for authentication to the git-annex special remote

dryrunbool

If True, only generates the commands and do not execute them (Default: False)

Apache 2.0 License

Copyright (c) 2021, Connectomics Lab, University and University Hospital Center of Lausanne (UNIL-CHUV), Switzerland, and Contributors

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Citing

Important

  • If your are using NeuroDataPub, please acknowledge this software but also Datalad with the following:

    1. Tourbier S, Hagmann P., (2021). NCCR-SYNAPSY/neurodatapub: NCCR-SYNAPSY Neuroimaging Dataset Publishing Tool (Version v0.4). Zenodo.

    2. Halchenko et al., (2021). DataLad: distributed system for joint management of code, data, and their relationship. Journal of Open Source Software, 6(63), 3262, https://doi.org/10.21105/joss.03262.

Changes

Version 0.4

Date: January 19, 2021

Fourth beta release of NeuroDataPub that includes in particular the following changes.

New Features
  • Create a bash script that records all commands generated by NeuroDataPub for later execution.

  • Give the option to handle datasets that do not follow BIDS standard. This can be configured either via the new is_not_bids option flag of the commandline interface, or by disabling the “Dataset is bids” option in the “Configuration of Directories” tab.

  • Ignore existing files during the copy of dataset files with rsync.

Bug Fixes
  • Correct the pattern employed during the check of the schema of the JSON configurations for the "github_login" and "github_token" fields.

  • Do not reload datalap.api and do not reset osf credentials in authenticate_osf().

Minor changes
  • Copyright year has been updated in all files.

  • --bids_dir argument of the commandline interface has been changed to --dataset_dir.

  • The attribute input_bids_dir of NeuroDataPubProject has been changed to input_dataset_dir.

  • Suppress QXcbConnection: XCB error message during execution of NeuroDataPub.

Documentation
  • Update documentation for the creation of the bash script using NeuroDataPub Assistant.

  • Add details about the new option to handle non BIDS datasets.

More…

Please check the main release pull request PR#38.

Version 0.3

Date: August 31, 2021

Third beta release of NeuroDataPub that includes in particular the following changes.

New Features
  • Publish dataset (no annex) to GitHub and the annexed files to OSF. (See PR#35)

  • Improve automation of GitHub authentication and add github_organization to the config file. (See PR#36)

Documentation
  • Update documentation for the publication to OSF and update the image for the configuration of the siblings using NeuroDataPub Assistant. (See PR#37)

More…

Please check the main release pull request PR#32.

Version 0.2

Date: August 09, 2021

Second beta release of NeuroDataPub that includes in particular the following changes.

New Features
  • Update automatically the SSH config with an entry for the remote SSH server to configure the user login used by default by ssh. (See PR#25)

Documentation
  • Add documentation page to give instructions for the remote data server setup. (See PR#28)

  • Update documentation for the creation of the conda environment and the installation of git-annex on Linux and MacOSX. (See PR#23)

Bug Fixes
  • Replace old datalad.api.publish() with new datalad.api.push(). (See PR#22)

Note

datalad.api.publish() was not able to handle properly the publication of the special git-annex remote such that it was impossible to get the content of the annexed files.

Misc
  • Add conda/environment_macosx.yml, a conda environment file specific to MacOSX where git-annex is not included. (See PR#23)

  • Use content of README as long_description in setup.py for publication to PyPI. (See PR#26)

More…

Please check the main release pull request PR#24.

Version 0.1

Date: August 05, 2021

Beta release which provides a first working prototype of NeuroDataPub.

Features
  • Provide a commandline interface (CLI) to create and publish neuroimaging datasets to GitHub NCCR-SYNAPSY, with files annexed in a host institution, accessible via ssh.

  • Adopt a traits/traitsui model that extends the CLI with a graphical user interface, aka the NeuroDataPub Assistant, to improve its accessibility by non IT experts.

  • Provide a Conda environment.yml to support the installation of Python with all dependencies.

  • Provide a setup.py to make installation of the neurodatapub package easy with pip install.

  • Adopt CircleCI for continuous integration testing. CircleCI project page: https://app.circleci.com/pipelines/github/NCCR-SYNAPSY/neurodatapub

  • Use Codacy to support code reviews and monitor code quality over time. Codacy project page: https://app.codacy.com/gh/NCCR-SYNAPSY/neurodatapub/dashboard

More…

For more change details and development discussions, please check:

Contributing

This project follows the all contributors specification. Contributions in many different ways are welcome!

Contribution Types

Report Bugs

Report bugs at https://github.com/NCCR-SYNAPSY/neurodatapub/issues.

If you are reporting a bug, please include:

  • Your operating system name and version.

  • Any details about your local setup that might be helpful in troubleshooting.

  • Detailed steps to reproduce the bug.

Fix Bugs

Look through the GitHub issues for bugs. Anything tagged with “bug” and “help wanted” is open to whoever wants to implement it.

Implement Features

Look through the GitHub issues for features. Anything tagged with “enhancement” and “help wanted” is open to whoever wants to implement it.

Write Documentation

NeuroDataPub could always use more documentation, whether as part of the official NeuroDataPub docs, in docstrings, or even on the web in blog posts, articles, and such.

Submit Feedback

The best way to send feedback is to create an issue at https://github.com/NCCR-SYNAPSY/neurodatapub/issues.

If you are proposing a feature:

  • Explain in detail how it would work.

  • Keep the scope as narrow as possible, to make it easier to implement.

  • Remember that this is a volunteer-driven project, and that contributions are welcome :)

Get Started!

Ready to contribute? Here’s how to set up NeuroDataPub for local development.

  1. Fork the neurodatapub repo on GitHub.

  2. Clone your fork locally:

    git clone git@github.com:your_name_here/neurodatapub.git
    cd neurodatapub
    
  3. Create a branch for local development:

    git checkout -b name-of-your-bugfix-or-feature
    
  4. Now you can make your changes locally.

Note

Please keep your commit the most specific to a change it describes. It is highly advice to track unstaged files with git status, add a file involved in the change to the stage one by one with git add <file>. The use of git add . is highly discouraged. When all the files for a given change are staged, commit the files with a brief message using git commit -m "[COMMIT_TYPE]: Your detailed description of the change." that describes your change and where [COMMIT_TYPE] can be [FIX] for a bug fix, [ENH] for a new feature, [MAINT] for code maintenance and typo fix, [DOC] for documentation, [CI] for continuous integration testing.

  1. When you’re done making changes, push your branch to GitHub:

    git push origin name-of-your-bugfix-or-feature
    
  2. Submit a pull request through the GitHub website.

Pull Request Guidelines

Before you submit a pull request, check that it meets these guidelines:

  1. If the pull request adds functionality, the docs should be updated (See documentation build instructions).

  2. The pull request should work for Python 3.8. Check https://app.circleci.com/pipelines/github/NCCR-SYNAPSY/neurodatapub and make sure that the tests pass.

How to install NeuroDataPub locally
  1. Install the NeuroDataPub conda environment neurodatapub-env that provides a Python 3.8 environment:

    cd neurodatapub
    conda env create -f conda/environment.yml
    
  2. Activate the neurodatapub-env conda environment and install neurodatapub

    conda activate neurodatapub-env
    pip install .
    
How to build the documentation locally
  1. Install the NeuroDataPub conda environment neurodatapub-env with sphinx and all extensions to generate the documentation:

    cd neurodatapub
    conda env create -f conda/environment.yml
    
  2. Activate the conda environment neurodatapub-env and install neurodatapub

    conda activate neurodatapub-env
    pip install .
    
  3. Run the script build_sphinx_docs.sh to generate the HTML documentation in documentation/_build/html:

    bash build_sphinx_docs.sh
    

Note

Make sure to have activated the conda environment neurodatapub-env before running the script build_sphinx_docs.sh.

Not listed as a contributor?

This is easy, NeuroDataPub has the all contributors bot installed.

Just comment on Issue or Pull Request (PR), asking @all-contributors to add you as contributor:

@all-contributors please add <github_username> for <contributions>

<contribution>: See the Emoji Key Contribution Types Reference for a list of valid contribution types.

The all-contributors bot will create a PR to add you in the README and reply with the pull request details.

When the PR is merged you will have to make an extra Pull Request where you have to:

  1. add your entry in the zenodo.json (for that you will need an ORCID ID - https://orcid.org/). Doing so, you will appear as a contributor on Zenodo in the future version releases of NeuroDataPub. Zenodo is used by NeuroDataPub to publish and archive each of the version release with a unique Digital Object Identifier (DOI), which can then be used for citation.

  2. update the content of the table in docs/contributors.rst with the new content generated by the bot in the README. Doing so, you will appear in the Contributors Page.


This document has been adapted from the MIALSRTK contributing guidelines and inspired by these great contributing guidelines.

Contributors


Sébastien Tourbier

💻 📖 🎨 🤔 🚇 🚧 🧑‍🏫 📆 💬 👀

Patric Hagmann

🔍