bidstools module
- clabtoolkit.bidstools.str2entity(string)[source]
Converts a formatted string into a dictionary.
- Parameters:
string (str) – String to convert, with the format key1-value1_key2-value2…suffix.extension.
- Returns:
Dictionary containing the entities extracted from the string.
- Return type:
Examples
>>> str2entity("sub-01_ses-M00_acq-3T_dir-AP_run-01_T1w.nii.gz") Returns: {'sub': '01', 'ses': 'M00', 'acq': '3T', 'dir': 'AP', 'run': '01', 'suffix': 'T1w', 'extension': 'nii.gz'}
- clabtoolkit.bidstools.entity2str(entity)[source]
Converts an entity dictionary to a string representation.
- Parameters:
entity (dict) – Dictionary containing the entities.
- Returns:
String containing the entities in the format key1-value1_key2-value2…suffix.extension.
- Return type:
Examples
>>> entity2str({'sub': '01', 'ses': 'M00', 'acq': '3T', 'dir': 'AP', 'run': '01', 'suffix': 'T1w', 'extension': 'nii.gz'}) Returns: "sub-01_ses-M00_acq-3T_dir-AP_run-01_T1w.nii.gz"
- clabtoolkit.bidstools.delete_entity(entity, ent2rem)[source]
Removes specified keys from an entity dictionary or string representation.
- Parameters:
- Returns:
The updated entity as a dictionary or string (matching the input type).
- Return type:
Examples
>>> delete_entity("sub-01_ses-M00_acq-3T_dir-AP_run-01_T1w.nii.gz", "acq") Returns: "sub-01_ses-M00_dir-AP_run-01_T1w.nii.gz"
- clabtoolkit.bidstools.replace_entity_value(entity, ent2replace, verbose=False)[source]
Replaces values in an entity dictionary or string representation.
- Parameters:
- Returns:
Updated entity as a dictionary or string (matching the input type).
- Return type:
Examples
>>> replace_entity_value("sub-01_ses-M00_acq-3T_dir-AP_run-01_T1w.nii.gz", {"acq": "7T"}) Returns: "sub-01_ses-M00_acq-7T_dir-AP_run-01_T1w.nii.gz"
- clabtoolkit.bidstools.replace_entity_key(entity, keys2replace, verbose=False)[source]
Replaces specified keys in an entity dictionary or string representation.
- Parameters:
- Returns:
Updated entity as a dictionary or string (matching the input type).
- Return type:
Examples
>>> replace_entity_key("sub-01_ses-M00_acq-3T_dir-AP_run-01_T1w.nii.gz", {"acq": "TESTrep1", "dir": "TESTrep2"}) Returns: "sub-01_ses-M00_TESTrep1-3T_TESTrep2-AP_run-01_T1w.nii.gz"
- clabtoolkit.bidstools.insert_entity(entity, entity2add, prev_entity=None)[source]
Adds entities to an existing entity dictionary or string representation.
- Parameters:
entity (dict or str) – Dictionary containing the entities or a string that follows the BIDS naming specifications.
entity2add (dict) – Dictionary containing the entities to add. IMPORTANT: If the entity2add contains keys that already exist in the entity, they will not be added.
prev_entity (str, optional) – Key in entity after which to insert the new entities.
- Returns:
Updated entity with the new entities added (matching the input type).
- Return type:
Examples
>>> insert_entity("sub-01_ses-M00_acq-3T_dir-AP_run-01_T1w.nii.gz", {"task": "rest"}) Returns: "sub-01_ses-M00_acq-3T_dir-AP_run-01_task-rest_T1w.nii.gz"
>>> insert_entity("sub-01_ses-M00_acq-3T_dir-AP_run-01_T1w.nii.gz", {"task": "rest"}, prev_entity="ses") Returns: "sub-01_ses-M00_task-rest_acq-3T_dir-AP_run-01_T1w.nii.gz"
- clabtoolkit.bidstools.recursively_replace_entity_value(root_dir, dict2old, dict2new)[source]
This method replaces the values of certain entities in all the files and folders of a BIDs dataset.
- clabtoolkit.bidstools.recursively_replace_entity_key(root_dir, replacements)[source]
This method replaces the keys of certain entities in all the files and folders of a BIDs dataset.
- Parameters:
- Returns:
The method will rename the files and folders in the BIDs dataset. All the files or folders containing the old entities’ names on their names will be renamed and the old entities will be replaced with the new entities.
- Return type:
None
- clabtoolkit.bidstools.recursively_delete_entity(root_dir, key2rem)[source]
This method deletes entities in all the files and folders of a BIDs dataset.
- Parameters:
- Returns:
The method will rename the files and folders in the BIDs dataset, removing from file names and folder names the entities containing the specified keys.
- Return type:
None
- clabtoolkit.bidstools.recursively_insert_entity(root_dir, entity2add, prev_entity=None)[source]
This method inserts entities in all the files and folders of a BIDs dataset.
- Parameters:
root_dir (str) – Root directory of the BIDs dataset
entity2add (dict) – Dictionary containing the entities to add. Example: {‘task’: ‘rest’, ‘run’: ‘01’}
prev_entity (str, optional) – Key in entity after which to insert the new entities. Otherwise it will be added at the end of the file name, just before the suffix.
- Returns:
The method will rename the files and folders in the BIDs dataset. All the files or folders containing the old entities’ names on their names will be renamed and the old entities will be replaced with the new entities.
- Return type:
None
- clabtoolkit.bidstools.get_all_entities(root_dir)[source]
Returns a set of all unique entities found in the BIDS dataset.
- Parameters:
root_dir (str) – Root directory of the BIDS dataset.
- Returns:
all_entities (Set[str]) – A set of unique entity names found in the dataset.
all_suffixes (List[str]) – A list of unique suffixes found in the dataset.
- Raises:
ValueError – If the specified root directory does not exist.
FileNotFoundError – If the default configuration file is not found.
ValueError – If the default configuration JSON does not have the expected structure.
- Return type:
Examples
>>> get_all_entities('/path/to/bids/dataset') {'sub', 'ses', 'task', 'run', ...} >>> get_all_entities('/path/to/bids/dataset') {'sub', 'ses', 'task', 'run', ...}, ['T1w', 'bold', ...]
- clabtoolkit.bidstools.entities4table(entities_json=None, selected_entities=None)[source]
Returns the BIDS entities that will be included in the morphometric table.
This function loads BIDS entities from a JSON configuration file and filters them based on optional selected entities.
- Parameters:
entities_json (str, optional) – Path to the JSON file with entity definitions. If None, the method uses the default config JSON file.
selected_entities (Union[str, Dict, List], optional) – Entities to select from the loaded entities. Can be: - A string with comma-separated entity names - A dictionary with entity names as keys - A list of entity names If None, all entities are included.
- Returns:
Dictionary of entity names and their values.
- Return type:
Dict
- Raises:
ValueError – If the provided JSON file path is invalid or the JSON format is incorrect.
FileNotFoundError – If the specified JSON file does not exist.
Examples
>>> # Using default config file (returns all entities) >>> entities4table() {'sub': {'...'}, 'ses': {'...'}, ... 'scale': {'...'}}
>>> # Using a custom JSON file >>> entities4table('path/to/custom/entities.json') {'sub': {'...'}, 'ses': {'...'}, ... 'scale': {'...'}}
>>> # Selecting specific entities >>> entities4table(selected_entities='sub,ses,run') {'sub': {'...'}, 'ses': {'...'}, 'run': {'...'}}
>>> # Using a dictionary to select entities >>> entities4table(selected_entities={'sub': None, 'ses': None}) {'sub': {'...'}, 'ses': {'...'}}
>>> # Using a list to select entities >>> entities4table(selected_entities=['sub', 'ses']) {'sub': {'...'}, 'ses': {'...'}}
- clabtoolkit.bidstools.entities_to_table(filepath, entities_to_extract=None, include_suffix=False)[source]
Creates a DataFrame with BIDS entities extracted from a filename.
…
- Parameters:
filepath (str) – Full path to the BIDS file from which to extract entities.
entities_to_extract (str, list, dict, or None, default=None) – Specifies which entities to extract from the filename: - If str: A single entity name to extract - If list: Multiple entity names to extract - If dict: Keys are entity names, values are custom column names - If None: Returns a single column with the full filename
include_suffix (bool, default=True) – If True, adds a ‘Type’ column containing the BIDS suffix (e.g., ‘bold’, ‘T1w’, ‘dwi’) extracted from the filename.
- Returns:
DataFrame containing the extracted entities as columns. If include_suffix is True, a ‘Type’ column is appended at the end. If the file is not BIDS-compliant, returns an empty DataFrame.
- Return type:
pd.DataFrame
Examples
>>> df = entities_to_table( ... '/data/sub-01/ses-pre/sub-01_ses-pre_task-rest_bold.nii.gz', ... ['sub', 'ses'], ... include_suffix=True ... ) >>> print(df) Participant Session Type 0 01 pre bold
- clabtoolkit.bidstools.get_subjects(bids_dir)[source]
Get a list of all subjects in the BIDs directory.
- clabtoolkit.bidstools.copy_bids_folder(bids_dir, out_dir, subjects_to_copy=None, folders_to_copy='all', deriv_dir=None, include_derivatives=None)[source]
This function copies the BIDs folder and its derivatives for given subjects to a new location.
- Parameters:
bids_dir (str) – Path to the BIDs directory.
out_dir (str) – Path to the output directory where the copied BIDs folder will be saved.
subjects_to_copy (list or str, optional) – List of subject IDs to copy. If None, all subjects will be copied.
folders_to_copy (list or str, optional) – List of BIDs folders to copy. If “all”, all folders will be copied. Default is “all”.
deriv_dir (str, optional) – Path to the derivatives directory. If None, it will be set to “derivatives” in the BIDs directory.
include_derivatives (str or list, optional) – List of derivatives to include. If “all”, all derivatives will be included. Default is None. If None, no derivatives will be copied. If “chimera”, only the chimera derivatives will be copied. If “all”, all derivatives will be copied. If a list, only the derivatives in the list will be copied. If a string, only the derivatives with the name in the string will be copied.
- Returns:
None – Copies the specified folders and subjects to the output directory.
Usage example
>>> bids_dir = “/path/to/bids”
>>> out_dir = “/path/to/output”
>>> copy_bids_folder(bids_dir, out_dir, subjects_to_copy=[“sub-01”], folders_to_copy=[“anat”])
>>> copy_bids_folder(bids_dir, out_dir, subjects_to_copy=[“sub-01”], include_derivatives=[“chimera”, “freesurfer”])
>>> copy_bids_folder(bids_dir, out_dir, subjects_to_copy=[“sub-01”], deriv_dir=”/path/to/derivatives”)
- clabtoolkit.bidstools.get_bids_database_table(root_dir, output_table=None)[source]
Generate a comprehensive summary table of all neuroimaging files in a BIDS dataset.
This function scans a BIDS dataset directory structure and creates a detailed table containing all BIDS entities (subject, session, acquisition, etc.) and file counts. The output table provides an overview of the dataset composition, making it easy to identify data availability, missing files, and dataset structure.
- Parameters:
root_dir (str) – Path to the BIDS dataset root directory. This should be the top-level directory containing subject folders (sub-*) and optionally a dataset_description.json file.
output_table (str, optional) – Path where the resulting CSV table should be saved. If None, the table is not saved to disk but still returned as a DataFrame. Default is None.
- Returns:
DataFrame with columns for each detected BIDS entity (Subject, Session, Acquisition, etc.), plus ‘suffix’ (image type like T1w, FLAIR) and ‘N’ (number of files for each unique combination). Each row represents a unique combination of BIDS entities and their file count.
- Return type:
pd.DataFrame
- Raises:
FileNotFoundError – If the specified root_dir does not exist.
NotADirectoryError – If root_dir exists but is not a directory.
ValueError – If no subjects are found in the BIDS dataset (no sub-* folders).
Examples
Basic usage - analyze dataset and return summary table:
>>> import pandas as pd >>> bids_table = get_bids_table('/path/to/bids/dataset') >>> print(f"Dataset contains {len(bids_table)} unique file combinations") >>> print(f"Total files: {bids_table['N'].sum()}")
Save summary table to CSV file:
>>> bids_table = get_bids_table( ... root_dir='/data/my_study', ... output_table='/data/my_study/bids_summary.csv' ... )
Analyze specific aspects of the dataset:
>>> # Count files by image type >>> suffix_counts = bids_table.groupby('suffix')['N'].sum() >>> print("Files by image type:") >>> print(suffix_counts)
>>> # Check data availability per subject >>> subject_counts = bids_table.groupby('Subject')['N'].sum() >>> print("Files per subject:") >>> print(subject_counts)
>>> # Find subjects with specific image types >>> t1w_subjects = bids_table[bids_table['suffix'] == 'T1w']['Subject'].unique() >>> print(f"Subjects with T1w images: {len(t1w_subjects)}")
Example output table structure:
>>> print(bids_table.head()) Subject Session Acquisition suffix N 0 sub-01 ses-01 acq-mprage T1w 1 1 sub-01 ses-01 acq-space T2w 1 2 sub-01 ses-01 None FLAIR 1 3 sub-02 ses-01 acq-mprage T1w 1 4 sub-02 ses-02 acq-mprage T1w 1
Notes
Only processes .nii.gz files (NIfTI compressed format)
Automatically detects all BIDS entities present in the dataset
Groups identical combinations and sums file counts
Results are sorted by Subject, Session, and suffix for readability
Progress is displayed using Rich progress bar during processing
Column names are converted to human-readable format (e.g., ‘sub’ -> ‘Subject’)
See also
clabtoolkit.bidstools.get_subjectsGet list of subjects in BIDS dataset
clabtoolkit.bidstools.get_all_entitiesExtract all BIDS entities from dataset
clabtoolkit.bidstools.str2entityParse BIDS filename to extract entities
- clabtoolkit.bidstools.get_derivatives_folders(deriv_dir)[source]
Get a list of all derivatives folders in the specified directory.
- Parameters:
deriv_dir (str) – Path to the derivatives directory.
- Returns:
List of derivatives folder names.
- Return type:
- Raises:
ValueError – If the derivatives directory does not exist.
TypeError – If the derivatives directory is not a string.
Usage example: –
>>> deriv_dir = "/path/to/derivatives" –
>>> print(get_derivatives_folders(deriv_dir)) –
- clabtoolkit.bidstools.is_bids_filename(filename)[source]
Validates a BIDS filename structure, handling extensions and entity order.
- Parameters:
filename (str) – The filename to validate.
- Returns
bool: True if valid BIDS filename, False otherwise.
- clabtoolkit.bidstools.get_individual_files_and_folders(input_folder, cad4query)[source]
This function detects all the files or folders inside a folder and its subfolders containing the strings supplied by the variable cad4query.
- Parameters:
- Returns:
List of files or folders that match the query.
- Return type:
- Raises:
ValueError – If the input folder does not exist.
TypeError – If the input folder is not a string.
Examples
>>> input_folder = "/path/to/input/folder" >>> cad4query = "sub-01" >>> files = get_individual_files_and_folders(input_folder, cad4query)
- clabtoolkit.bidstools.generate_bids_tree(bids_root, max_depth=None, show_hidden=False, exclude_patterns=None, save_to_file=None)[source]
Generate an MS-DOS tree-style visualization of a BIDS folder structure.
- Parameters:
bids_root (str) – Path to the BIDS root directory.
max_depth (int, optional) – Maximum depth to traverse. If None (default), traverses entire directory structure without depth limitation.
show_hidden (bool, optional) – Whether to show hidden files and folders (starting with ‘.’). Default is False.
exclude_patterns (set of str, optional) – Set of file/folder name patterns to exclude from the tree. If None, defaults to {‘.git’, ‘__pycache__’, ‘.DS_Store’, ‘Thumbs.db’}.
save_to_file (str, optional) – Path to save the tree output as a text file. If None, only returns the string without saving.
- Returns:
MS-DOS tree representation of the BIDS structure with proper tree symbols (├──, └──, │) and directory indicators (/).
- Return type:
- Raises:
FileNotFoundError – If the specified bids_root path does not exist.
NotADirectoryError – If the specified bids_root path is not a directory.
PermissionError – If there are insufficient permissions to read certain directories. Individual permission errors are handled gracefully and noted in output.
OSError – If there are file system related errors during tree generation or file saving operations.
Notes
Directories are displayed with a trailing ‘/’ to distinguish from files
Items are sorted with directories first, then files, both alphabetically
Hidden files/folders (starting with ‘.’) are excluded by default
Permission errors for individual subdirectories are handled gracefully
The tree uses standard MS-DOS tree symbols for proper visualization
When max_depth is None, the entire directory structure is traversed
Examples
Basic usage with unlimited depth:
>>> tree = generate_bids_tree('/path/to/bids/dataset') >>> print(tree) my-bids-dataset/ ├── dataset_description.json ├── participants.tsv ├── sub-01/ │ ├── anat/ │ │ └── sub-01_T1w.nii.gz │ └── func/ │ ├── sub-01_task-rest_bold.nii.gz │ └── sub-01_task-rest_events.tsv └── derivatives/ └── preprocessing/ └── sub-01/
Limited depth with file saving:
>>> tree = generate_bids_tree('/path/to/bids/dataset', ... max_depth=2, ... save_to_file='bids_tree.txt') >>> print("Tree saved to bids_tree.txt")
Include hidden files and custom exclusions:
>>> tree = generate_bids_tree('/path/to/bids/dataset', ... show_hidden=True, ... exclude_patterns={'temp', 'backup'})
- clabtoolkit.bidstools.generate_bids_tree_with_stats(bids_root, **kwargs)[source]
Generate a BIDS tree with additional statistics.
- Parameters:
bids_root (str) – Path to the BIDS root directory.
**kwargs – Additional keyword arguments passed to generate_bids_tree(). See generate_bids_tree() documentation for available parameters.
- Returns:
Tree representation with file and folder count statistics appended.
- Return type:
- Raises:
FileNotFoundError – If the specified bids_root path does not exist.
NotADirectoryError – If the specified bids_root path is not a directory.
PermissionError – If there are insufficient permissions to read directories.
OSError – If there are file system related errors.
Notes
Statistics are calculated by recursively counting all files and directories in the BIDS structure, regardless of the max_depth parameter used for tree visualization.
Examples
>>> tree_with_stats = generate_bids_tree_with_stats('/path/to/bids/dataset') >>> print(tree_with_stats) my-bids-dataset/ ├── dataset_description.json └── sub-01/ └── anat/ └── sub-01_T1w.nii.gz
Statistics: ├── Directories: 2 └── Files: 2
- clabtoolkit.bidstools.validate_bids_structure(bids_root)[source]
Basic validation of BIDS structure and return warnings.
- Parameters:
bids_root (str) – Path to the BIDS root directory.
- Returns:
List of validation warnings and notes about the BIDS structure. Empty list indicates no issues found.
- Return type:
- Raises:
FileNotFoundError – If the specified bids_root path does not exist.
NotADirectoryError – If the specified bids_root path is not a directory.
Notes
This function performs basic BIDS validation including: - Checking for required files (dataset_description.json) - Verifying presence of subject directories (sub-*) - Noting presence of derivatives directory
For comprehensive BIDS validation, consider using the official BIDS validator tool.
Examples
>>> warnings = validate_bids_structure('/path/to/bids/dataset') >>> if warnings: ... for warning in warnings: ... print(f"⚠️ {warning}") >>> else: ... print("✅ Basic BIDS structure looks good!")
- clabtoolkit.bidstools.load_bids_json(bids_json=None)[source]
Load the JSON file containing the BIDs configuration file.
The bidstools module provides comprehensive support for Brain Imaging Data Structure (BIDS) datasets. This module enables you to work with BIDS naming conventions, manipulate entities, organize datasets, and generate database tables from BIDS structures.
Key Features
Convert between BIDS filename strings and entity dictionaries
Manipulate BIDS entities (add, remove, replace)
Extract subject lists and dataset summaries
Copy and organize BIDS folders with filtering
Generate comprehensive database tables from BIDS datasets
Validate BIDS compliance
Common Usage Examples
Basic entity manipulation:
import clabtoolkit.bidstools as bids
# Parse BIDS filename
entities = bids.str2entity("sub-01_ses-M00_T1w.nii.gz")
# Returns: {'sub': '01', 'ses': 'M00', 'suffix': 'T1w', 'extension': 'nii.gz'}
# Convert back to filename
filename = bids.entity2str(entities)
# Modify entities
new_filename = bids.replace_entity_value(filename, 'ses', 'M12')
Dataset management:
# Get all subjects in a BIDS dataset
subjects = bids.get_subjects("/path/to/bids/dataset")
# Generate dataset overview table
database = bids.get_bids_database_table("/path/to/bids/dataset")
# Copy subset of BIDS dataset
bids.copy_bids_folder(
source_dir="/path/to/source",
target_dir="/path/to/target",
subjects=['sub-01', 'sub-02']
)