bidstools module

clabtoolkit.bidstools.str2entity(string)[source]

Converts a formatted string into a dictionary.

Parameters:

string (str) – String to convert, with the format key1-value1_key2-value2…suffix.extension.

Returns:

Dictionary containing the entities extracted from the string.

Return type:

dict

Examples

>>> str2entity("sub-01_ses-M00_acq-3T_dir-AP_run-01_T1w.nii.gz")
Returns: {'sub': '01', 'ses': 'M00', 'acq': '3T', 'dir': 'AP', 'run': '01', 'suffix': 'T1w', 'extension': 'nii.gz'}
clabtoolkit.bidstools.entity2str(entity)[source]

Converts an entity dictionary to a string representation.

Parameters:

entity (dict) – Dictionary containing the entities.

Returns:

String containing the entities in the format key1-value1_key2-value2…suffix.extension.

Return type:

str

Examples

>>> entity2str({'sub': '01', 'ses': 'M00', 'acq': '3T', 'dir': 'AP', 'run': '01', 'suffix': 'T1w', 'extension': 'nii.gz'})
Returns: "sub-01_ses-M00_acq-3T_dir-AP_run-01_T1w.nii.gz"
clabtoolkit.bidstools.delete_entity(entity, ent2rem)[source]

Removes specified keys from an entity dictionary or string representation.

Parameters:
  • entity (dict or str) – Dictionary or string containing the entities.

  • ent2rem (List[str], str or dict) – Entities to be removed from the entity dictionary or string. If ent2rem is a dictionary, only the combination key-value will be removed from the filenames.

Returns:

The updated entity as a dictionary or string (matching the input type).

Return type:

Union[dict, str]

Examples

>>> delete_entity("sub-01_ses-M00_acq-3T_dir-AP_run-01_T1w.nii.gz", "acq")
Returns: "sub-01_ses-M00_dir-AP_run-01_T1w.nii.gz"
clabtoolkit.bidstools.replace_entity_value(entity, ent2replace, verbose=False)[source]

Replaces values in an entity dictionary or string representation.

Parameters:
  • entity (dict or str) – Dictionary or string containing the entities.

  • ent2replace (dict or str) – Dictionary or string containing entities to replace with new values.

  • verbose (bool, optional) – If True, prints warnings for non-existent or empty values.

Returns:

Updated entity as a dictionary or string (matching the input type).

Return type:

Union[dict, str]

Examples

>>> replace_entity_value("sub-01_ses-M00_acq-3T_dir-AP_run-01_T1w.nii.gz", {"acq": "7T"})
Returns: "sub-01_ses-M00_acq-7T_dir-AP_run-01_T1w.nii.gz"
clabtoolkit.bidstools.replace_entity_key(entity, keys2replace, verbose=False)[source]

Replaces specified keys in an entity dictionary or string representation.

Parameters:
  • entity (dict or str) – Dictionary containing the entities or a string that follows the BIDS naming specifications.

  • keys2replace (dict) – Dictionary mapping old keys to new keys.

  • verbose (bool, optional) – If True, prints warnings for keys in keys2replace that are not found in entity.

Returns:

Updated entity as a dictionary or string (matching the input type).

Return type:

Union[dict, str]

Examples

>>> replace_entity_key("sub-01_ses-M00_acq-3T_dir-AP_run-01_T1w.nii.gz", {"acq": "TESTrep1", "dir": "TESTrep2"})
Returns: "sub-01_ses-M00_TESTrep1-3T_TESTrep2-AP_run-01_T1w.nii.gz"
clabtoolkit.bidstools.insert_entity(entity, entity2add, prev_entity=None)[source]

Adds entities to an existing entity dictionary or string representation.

Parameters:
  • entity (dict or str) – Dictionary containing the entities or a string that follows the BIDS naming specifications.

  • entity2add (dict) – Dictionary containing the entities to add. IMPORTANT: If the entity2add contains keys that already exist in the entity, they will not be added.

  • prev_entity (str, optional) – Key in entity after which to insert the new entities.

Returns:

Updated entity with the new entities added (matching the input type).

Return type:

Union[dict, str]

Examples

>>> insert_entity("sub-01_ses-M00_acq-3T_dir-AP_run-01_T1w.nii.gz", {"task": "rest"})
Returns: "sub-01_ses-M00_acq-3T_dir-AP_run-01_task-rest_T1w.nii.gz"
>>> insert_entity("sub-01_ses-M00_acq-3T_dir-AP_run-01_T1w.nii.gz", {"task": "rest"}, prev_entity="ses")
Returns: "sub-01_ses-M00_task-rest_acq-3T_dir-AP_run-01_T1w.nii.gz"
clabtoolkit.bidstools.recursively_replace_entity_value(root_dir, dict2old, dict2new)[source]

This method replaces the values of certain entities in all the files and folders of a BIDs dataset.

Parameters:
  • root_dir (str) – Root directory of the BIDs dataset

  • dict2old (dict or str) – Dictionary containing the entities to replace and their old values

  • dict2new (dict or str) – Dictionary containing the entities to replace and their new values

clabtoolkit.bidstools.recursively_replace_entity_key(root_dir, replacements)[source]

This method replaces the keys of certain entities in all the files and folders of a BIDs dataset.

Parameters:
  • root_dir (str) – Root directory of the BIDs dataset

  • replacements (dict) – Dictionary containing the entities to replace and their new keys. Example: {‘acq’: ‘desc’, ‘run’: ‘runny’}

Returns:

The method will rename the files and folders in the BIDs dataset. All the files or folders containing the old entities’ names on their names will be renamed and the old entities will be replaced with the new entities.

Return type:

None

clabtoolkit.bidstools.recursively_delete_entity(root_dir, key2rem)[source]

This method deletes entities in all the files and folders of a BIDs dataset.

Parameters:
  • root_dir (str) – Root directory of the BIDs dataset

  • key2rem (list or str) – Key(s) of the entities that will be removed from the files and folders.

Returns:

The method will rename the files and folders in the BIDs dataset, removing from file names and folder names the entities containing the specified keys.

Return type:

None

clabtoolkit.bidstools.recursively_insert_entity(root_dir, entity2add, prev_entity=None)[source]

This method inserts entities in all the files and folders of a BIDs dataset.

Parameters:
  • root_dir (str) – Root directory of the BIDs dataset

  • entity2add (dict) – Dictionary containing the entities to add. Example: {‘task’: ‘rest’, ‘run’: ‘01’}

  • prev_entity (str, optional) – Key in entity after which to insert the new entities. Otherwise it will be added at the end of the file name, just before the suffix.

Returns:

The method will rename the files and folders in the BIDs dataset. All the files or folders containing the old entities’ names on their names will be renamed and the old entities will be replaced with the new entities.

Return type:

None

clabtoolkit.bidstools.get_all_entities(root_dir)[source]

Returns a set of all unique entities found in the BIDS dataset.

Parameters:

root_dir (str) – Root directory of the BIDS dataset.

Returns:

  • all_entities (Set[str]) – A set of unique entity names found in the dataset.

  • all_suffixes (List[str]) – A list of unique suffixes found in the dataset.

Raises:
  • ValueError – If the specified root directory does not exist.

  • FileNotFoundError – If the default configuration file is not found.

  • ValueError – If the default configuration JSON does not have the expected structure.

Return type:

Tuple[Dict[str, Set[str]], List[str]]

Examples

>>> get_all_entities('/path/to/bids/dataset')
{'sub', 'ses', 'task', 'run', ...}
>>> get_all_entities('/path/to/bids/dataset')
{'sub', 'ses', 'task', 'run', ...}, ['T1w', 'bold', ...]
clabtoolkit.bidstools.entities4table(entities_json=None, selected_entities=None)[source]

Returns the BIDS entities that will be included in the morphometric table.

This function loads BIDS entities from a JSON configuration file and filters them based on optional selected entities.

Parameters:
  • entities_json (str, optional) – Path to the JSON file with entity definitions. If None, the method uses the default config JSON file.

  • selected_entities (Union[str, Dict, List], optional) – Entities to select from the loaded entities. Can be: - A string with comma-separated entity names - A dictionary with entity names as keys - A list of entity names If None, all entities are included.

Returns:

Dictionary of entity names and their values.

Return type:

Dict

Raises:
  • ValueError – If the provided JSON file path is invalid or the JSON format is incorrect.

  • FileNotFoundError – If the specified JSON file does not exist.

Examples

>>> # Using default config file (returns all entities)
>>> entities4table()
{'sub': {'...'}, 'ses': {'...'}, ... 'scale': {'...'}}
>>> # Using a custom JSON file
>>> entities4table('path/to/custom/entities.json')
{'sub': {'...'}, 'ses': {'...'}, ... 'scale': {'...'}}
>>> # Selecting specific entities
>>> entities4table(selected_entities='sub,ses,run')
{'sub': {'...'}, 'ses': {'...'}, 'run': {'...'}}
>>> # Using a dictionary to select entities
>>> entities4table(selected_entities={'sub': None, 'ses': None})
{'sub': {'...'}, 'ses': {'...'}}
>>> # Using a list to select entities
>>> entities4table(selected_entities=['sub', 'ses'])
{'sub': {'...'}, 'ses': {'...'}}
clabtoolkit.bidstools.entities_to_table(filepath, entities_to_extract=None, include_suffix=False)[source]

Creates a DataFrame with BIDS entities extracted from a filename.

Parameters:
  • filepath (str) – Full path to the BIDS file from which to extract entities.

  • entities_to_extract (str, list, dict, or None, default=None) – Specifies which entities to extract from the filename: - If str: A single entity name to extract - If list: Multiple entity names to extract - If dict: Keys are entity names, values are custom column names - If None: Returns a single column with the full filename

  • include_suffix (bool, default=True) – If True, adds a ‘Type’ column containing the BIDS suffix (e.g., ‘bold’, ‘T1w’, ‘dwi’) extracted from the filename.

Returns:

DataFrame containing the extracted entities as columns. If include_suffix is True, a ‘Type’ column is appended at the end. If the file is not BIDS-compliant, returns an empty DataFrame.

Return type:

pd.DataFrame

Examples

>>> df = entities_to_table(
...     '/data/sub-01/ses-pre/sub-01_ses-pre_task-rest_bold.nii.gz',
...     ['sub', 'ses'],
...     include_suffix=True
... )
>>> print(df)
    Participant Session  Type
0          01     pre  bold
clabtoolkit.bidstools.get_subjects(bids_dir)[source]

Get a list of all subjects in the BIDs directory.

Parameters:

bids_dir (str) – Path to the BIDs directory.

Returns:

  • list – List of subject IDs.

  • Usage example

  • >>> bids_dir = “/path/to/bids”

  • >>> print(get_subjects(bids_dir))

  • [“sub-01”, “sub-02”, …]

Return type:

list

clabtoolkit.bidstools.copy_bids_folder(bids_dir, out_dir, subjects_to_copy=None, folders_to_copy='all', deriv_dir=None, include_derivatives=None)[source]

This function copies the BIDs folder and its derivatives for given subjects to a new location.

Parameters:
  • bids_dir (str) – Path to the BIDs directory.

  • out_dir (str) – Path to the output directory where the copied BIDs folder will be saved.

  • subjects_to_copy (list or str, optional) – List of subject IDs to copy. If None, all subjects will be copied.

  • folders_to_copy (list or str, optional) – List of BIDs folders to copy. If “all”, all folders will be copied. Default is “all”.

  • deriv_dir (str, optional) – Path to the derivatives directory. If None, it will be set to “derivatives” in the BIDs directory.

  • include_derivatives (str or list, optional) – List of derivatives to include. If “all”, all derivatives will be included. Default is None. If None, no derivatives will be copied. If “chimera”, only the chimera derivatives will be copied. If “all”, all derivatives will be copied. If a list, only the derivatives in the list will be copied. If a string, only the derivatives with the name in the string will be copied.

Returns:

  • None – Copies the specified folders and subjects to the output directory.

  • Usage example

  • >>> bids_dir = “/path/to/bids”

  • >>> out_dir = “/path/to/output”

  • >>> copy_bids_folder(bids_dir, out_dir, subjects_to_copy=[“sub-01”], folders_to_copy=[“anat”])

  • >>> copy_bids_folder(bids_dir, out_dir, subjects_to_copy=[“sub-01”], include_derivatives=[“chimera”, “freesurfer”])

  • >>> copy_bids_folder(bids_dir, out_dir, subjects_to_copy=[“sub-01”], deriv_dir=”/path/to/derivatives”)

clabtoolkit.bidstools.get_bids_database_table(root_dir, output_table=None)[source]

Generate a comprehensive summary table of all neuroimaging files in a BIDS dataset.

This function scans a BIDS dataset directory structure and creates a detailed table containing all BIDS entities (subject, session, acquisition, etc.) and file counts. The output table provides an overview of the dataset composition, making it easy to identify data availability, missing files, and dataset structure.

Parameters:
  • root_dir (str) – Path to the BIDS dataset root directory. This should be the top-level directory containing subject folders (sub-*) and optionally a dataset_description.json file.

  • output_table (str, optional) – Path where the resulting CSV table should be saved. If None, the table is not saved to disk but still returned as a DataFrame. Default is None.

Returns:

DataFrame with columns for each detected BIDS entity (Subject, Session, Acquisition, etc.), plus ‘suffix’ (image type like T1w, FLAIR) and ‘N’ (number of files for each unique combination). Each row represents a unique combination of BIDS entities and their file count.

Return type:

pd.DataFrame

Raises:

Examples

Basic usage - analyze dataset and return summary table:

>>> import pandas as pd
>>> bids_table = get_bids_table('/path/to/bids/dataset')
>>> print(f"Dataset contains {len(bids_table)} unique file combinations")
>>> print(f"Total files: {bids_table['N'].sum()}")

Save summary table to CSV file:

>>> bids_table = get_bids_table(
...     root_dir='/data/my_study',
...     output_table='/data/my_study/bids_summary.csv'
... )

Analyze specific aspects of the dataset:

>>> # Count files by image type
>>> suffix_counts = bids_table.groupby('suffix')['N'].sum()
>>> print("Files by image type:")
>>> print(suffix_counts)
>>> # Check data availability per subject
>>> subject_counts = bids_table.groupby('Subject')['N'].sum()
>>> print("Files per subject:")
>>> print(subject_counts)
>>> # Find subjects with specific image types
>>> t1w_subjects = bids_table[bids_table['suffix'] == 'T1w']['Subject'].unique()
>>> print(f"Subjects with T1w images: {len(t1w_subjects)}")

Example output table structure:

>>> print(bids_table.head())
    Subject Session Acquisition  suffix  N
0    sub-01     ses-01        acq-mprage    T1w  1
1    sub-01     ses-01        acq-space    T2w  1
2    sub-01     ses-01           None   FLAIR  1
3    sub-02     ses-01        acq-mprage    T1w  1
4    sub-02     ses-02        acq-mprage    T1w  1

Notes

  • Only processes .nii.gz files (NIfTI compressed format)

  • Automatically detects all BIDS entities present in the dataset

  • Groups identical combinations and sums file counts

  • Results are sorted by Subject, Session, and suffix for readability

  • Progress is displayed using Rich progress bar during processing

  • Column names are converted to human-readable format (e.g., ‘sub’ -> ‘Subject’)

See also

clabtoolkit.bidstools.get_subjects

Get list of subjects in BIDS dataset

clabtoolkit.bidstools.get_all_entities

Extract all BIDS entities from dataset

clabtoolkit.bidstools.str2entity

Parse BIDS filename to extract entities

clabtoolkit.bidstools.get_derivatives_folders(deriv_dir)[source]

Get a list of all derivatives folders in the specified directory.

Parameters:

deriv_dir (str) – Path to the derivatives directory.

Returns:

List of derivatives folder names.

Return type:

list

Raises:
  • ValueError – If the derivatives directory does not exist.

  • TypeError – If the derivatives directory is not a string.

  • Usage example:

  • >>> deriv_dir = "/path/to/derivatives"

  • >>> print(get_derivatives_folders(deriv_dir))

clabtoolkit.bidstools.is_bids_filename(filename)[source]

Validates a BIDS filename structure, handling extensions and entity order.

Parameters:

filename (str) – The filename to validate.

Returns

bool: True if valid BIDS filename, False otherwise.

clabtoolkit.bidstools.get_individual_files_and_folders(input_folder, cad4query)[source]

This function detects all the files or folders inside a folder and its subfolders containing the strings supplied by the variable cad4query.

Parameters:
  • input_folder (str) – Path to the input folder.

  • cad4query (str, list, or dict) – String or list of strings to filter the files and folders. If a dictionary is provided, it should contain key-value pairs where the key is the string before ‘-’ and the value is the string after ‘-‘.

Returns:

List of files or folders that match the query.

Return type:

list

Raises:
  • ValueError – If the input folder does not exist.

  • TypeError – If the input folder is not a string.

Examples

>>> input_folder = "/path/to/input/folder"
>>> cad4query = "sub-01"
>>> files = get_individual_files_and_folders(input_folder, cad4query)
clabtoolkit.bidstools.generate_bids_tree(bids_root, max_depth=None, show_hidden=False, exclude_patterns=None, save_to_file=None)[source]

Generate an MS-DOS tree-style visualization of a BIDS folder structure.

Parameters:
  • bids_root (str) – Path to the BIDS root directory.

  • max_depth (int, optional) – Maximum depth to traverse. If None (default), traverses entire directory structure without depth limitation.

  • show_hidden (bool, optional) – Whether to show hidden files and folders (starting with ‘.’). Default is False.

  • exclude_patterns (set of str, optional) – Set of file/folder name patterns to exclude from the tree. If None, defaults to {‘.git’, ‘__pycache__’, ‘.DS_Store’, ‘Thumbs.db’}.

  • save_to_file (str, optional) – Path to save the tree output as a text file. If None, only returns the string without saving.

Returns:

MS-DOS tree representation of the BIDS structure with proper tree symbols (├──, └──, │) and directory indicators (/).

Return type:

str

Raises:
  • FileNotFoundError – If the specified bids_root path does not exist.

  • NotADirectoryError – If the specified bids_root path is not a directory.

  • PermissionError – If there are insufficient permissions to read certain directories. Individual permission errors are handled gracefully and noted in output.

  • OSError – If there are file system related errors during tree generation or file saving operations.

Notes

  • Directories are displayed with a trailing ‘/’ to distinguish from files

  • Items are sorted with directories first, then files, both alphabetically

  • Hidden files/folders (starting with ‘.’) are excluded by default

  • Permission errors for individual subdirectories are handled gracefully

  • The tree uses standard MS-DOS tree symbols for proper visualization

  • When max_depth is None, the entire directory structure is traversed

Examples

Basic usage with unlimited depth:

>>> tree = generate_bids_tree('/path/to/bids/dataset')
>>> print(tree)
my-bids-dataset/
├── dataset_description.json
├── participants.tsv
├── sub-01/
│   ├── anat/
│   │   └── sub-01_T1w.nii.gz
│   └── func/
│       ├── sub-01_task-rest_bold.nii.gz
│       └── sub-01_task-rest_events.tsv
└── derivatives/
    └── preprocessing/
        └── sub-01/

Limited depth with file saving:

>>> tree = generate_bids_tree('/path/to/bids/dataset',
...                          max_depth=2,
...                          save_to_file='bids_tree.txt')
>>> print("Tree saved to bids_tree.txt")

Include hidden files and custom exclusions:

>>> tree = generate_bids_tree('/path/to/bids/dataset',
...                          show_hidden=True,
...                          exclude_patterns={'temp', 'backup'})
clabtoolkit.bidstools.generate_bids_tree_with_stats(bids_root, **kwargs)[source]

Generate a BIDS tree with additional statistics.

Parameters:
  • bids_root (str) – Path to the BIDS root directory.

  • **kwargs – Additional keyword arguments passed to generate_bids_tree(). See generate_bids_tree() documentation for available parameters.

Returns:

Tree representation with file and folder count statistics appended.

Return type:

str

Raises:

Notes

Statistics are calculated by recursively counting all files and directories in the BIDS structure, regardless of the max_depth parameter used for tree visualization.

Examples

>>> tree_with_stats = generate_bids_tree_with_stats('/path/to/bids/dataset')
>>> print(tree_with_stats)
my-bids-dataset/
├── dataset_description.json
└── sub-01/
    └── anat/
        └── sub-01_T1w.nii.gz

Statistics: ├── Directories: 2 └── Files: 2

clabtoolkit.bidstools.validate_bids_structure(bids_root)[source]

Basic validation of BIDS structure and return warnings.

Parameters:

bids_root (str) – Path to the BIDS root directory.

Returns:

List of validation warnings and notes about the BIDS structure. Empty list indicates no issues found.

Return type:

list of str

Raises:

Notes

This function performs basic BIDS validation including: - Checking for required files (dataset_description.json) - Verifying presence of subject directories (sub-*) - Noting presence of derivatives directory

For comprehensive BIDS validation, consider using the official BIDS validator tool.

Examples

>>> warnings = validate_bids_structure('/path/to/bids/dataset')
>>> if warnings:
...     for warning in warnings:
...         print(f"⚠️ {warning}")
>>> else:
...     print("✅ Basic BIDS structure looks good!")
clabtoolkit.bidstools.load_bids_json(bids_json=None)[source]

Load the JSON file containing the BIDs configuration file.

Parameters:

bids_json (str) – JSON file containing the BIDs configuration.

Returns:

config_dict – Dictionary containing the default .

Return type:

dict

The bidstools module provides comprehensive support for Brain Imaging Data Structure (BIDS) datasets. This module enables you to work with BIDS naming conventions, manipulate entities, organize datasets, and generate database tables from BIDS structures.

Key Features

  • Convert between BIDS filename strings and entity dictionaries

  • Manipulate BIDS entities (add, remove, replace)

  • Extract subject lists and dataset summaries

  • Copy and organize BIDS folders with filtering

  • Generate comprehensive database tables from BIDS datasets

  • Validate BIDS compliance

Common Usage Examples

Basic entity manipulation:

import clabtoolkit.bidstools as bids

# Parse BIDS filename
entities = bids.str2entity("sub-01_ses-M00_T1w.nii.gz")
# Returns: {'sub': '01', 'ses': 'M00', 'suffix': 'T1w', 'extension': 'nii.gz'}

# Convert back to filename
filename = bids.entity2str(entities)

# Modify entities
new_filename = bids.replace_entity_value(filename, 'ses', 'M12')

Dataset management:

# Get all subjects in a BIDS dataset
subjects = bids.get_subjects("/path/to/bids/dataset")

# Generate dataset overview table
database = bids.get_bids_database_table("/path/to/bids/dataset")

# Copy subset of BIDS dataset
bids.copy_bids_folder(
    source_dir="/path/to/source",
    target_dir="/path/to/target",
    subjects=['sub-01', 'sub-02']
)