paradigma.orchestrator
High-level pipeline orchestrator for ParaDigMa toolbox.
This module provides the main entry point for running analysis pipelines:
Main Function
run_paradigma(): Complete pipeline from data loading/preparation to aggregated results. Main entry point for end-to-end analysis supporting multiple pipelines (gait, tremor, pulse_rate). Can process raw data from disk or already-prepared DataFrames.
The orchestrator coordinates: 1. Data loading and preparation (unit conversion, resampling, orientation correction) 2. Pipeline execution on single or multiple files (imports from pipeline modules) 3. Result aggregation across files and segments 4. Optional intermediate result storage
Supports multi-file processing with automatic segment numbering and metadata tracking.
Attributes
Functions
|
Complete ParaDigMa analysis pipeline from data loading to aggregated results. |
Module Contents
- paradigma.orchestrator.logger
- paradigma.orchestrator.DETAILED_INFO = 15
- paradigma.orchestrator.run_paradigma(*, data_path: str | pathlib.Path | None = None, dfs: pandas.DataFrame | list[pandas.DataFrame] | dict[str, pandas.DataFrame] | None = None, save_intermediate: list[str] = [], output_dir: str | pathlib.Path = './output', skip_preparation: bool = False, pipelines: list[str] | str | None = None, watch_side: str | None = None, accelerometer_units: str = 'g', gyroscope_units: str = 'deg/s', time_input_unit: paradigma.constants.TimeUnit = TimeUnit.RELATIVE_S, target_frequency: float = 100.0, column_mapping: dict[str, str] | None = None, device_orientation: list[str] | None = ['x', 'y', 'z'], file_pattern: str | list[str] | None = None, aggregates: list[str] | None = None, segment_length_bins: list[str] | None = None, split_by_gaps: bool = False, max_gap_seconds: float | None = None, min_segment_seconds: float | None = None, imu_config: paradigma.config.IMUConfig | None = None, ppg_config: paradigma.config.PPGConfig | None = None, gait_config: paradigma.config.GaitConfig | None = None, arm_activity_config: paradigma.config.GaitConfig | None = None, tremor_config: paradigma.config.TremorConfig | None = None, pulse_rate_config: paradigma.config.PulseRateConfig | None = None, logging_level: int = logging.INFO, custom_logger: logging.Logger | None = None) dict[str, pandas.DataFrame | dict][source]
Complete ParaDigMa analysis pipeline from data loading to aggregated results.
This is the main entry point for ParaDigMa analysis. It supports multiple pipeline types: - gait: Arm swing during gait analysis - tremor: Tremor detection and quantification - pulse_rate: Pulse rate estimation from PPG signals
The function: 1. Loads data files from the specified directory or uses provided DataFrame 2. Prepares raw data if needed (unit conversion, resampling, etc.) 3. Runs the specified pipeline on each data file 4. Aggregates results across all data files
- Parameters:
data_path (str or Path, optional) – Path to directory containing data files.
dfs (DataFrame, list of DataFrames, or dict of DataFrames, optional) – Dataframes used as input (bypasses data loading). Can be: - Single DataFrame: Will be processed as one file with key ‘df_1’. - List[DataFrame]: Multiple dataframes assigned IDs as ‘df_1’, ‘df_2’, etc. - Dict[str, DataFrame]: Keys are file names, values are dataframes. Note: The ‘file_key’ column is only added to quantification results when len(dfs) > 1, allowing cleaner output for single-file processing. See input_formats guide for details.
save_intermediate (list of str, default []) – Which intermediate results to store. Valid values: - ‘preparation’: Save prepared data - ‘preprocessing’: Save preprocessed signals - ‘classification’: Save classification results - ‘quantification’: Save quantified measures - ‘aggregation’: Save aggregated results If empty, no files are saved (results are only returned).
output_dir (str or Path, default './output') – Output directory for all results. Files are only saved if save_intermediate is not empty.
skip_preparation (bool, default False) – Whether data is already prepared. If False, data will be prepared (unit conversion, resampling, etc.). If True, assumes data is already in the required format.
pipelines (list of str or str, optional) – Pipelines to run: ‘gait’, ‘tremor’, and/or ‘pulse_rate’. If providing a list, currently only tremor and gait pipelines can be run together.
watch_side (str, optional) – Watch side: ‘left’ or ‘right’ (required for gait pipeline).
accelerometer_units (str, default 'm/s^2') – Units for accelerometer data.
gyroscope_units (str, default 'deg/s') – Units for gyroscope data.
time_input_unit (TimeUnit, default TimeUnit.RELATIVE_S) – Input time unit type.
target_frequency (float, default 100.0) – Target sampling frequency for resampling.
column_mapping (dict, optional) – Custom column name mapping.
device_orientation (list of str, optional) – Custom device orientation corrections.
file_pattern (str or list of str, optional) – File pattern(s) to match when loading data (e.g., ‘parquet’, ‘*.csv’).
aggregates (list of str, optional) – Aggregation methods for quantification.
segment_length_bins (list of str, optional) – Duration bins for gait segment aggregation (gait pipeline only). Example: [‘(0, 10)’, ‘(10, 20)’] for segments 0-10s and 10-20s.
split_by_gaps (bool, default False) – If True, automatically split non-contiguous data into segments during preparation. Adds ‘data_segment_nr’ column to prepared data which is preserved through pipeline. Useful for handling data with gaps/interruptions.
max_gap_seconds (float, optional) – Maximum gap (seconds) before starting new segment. Used when split_by_gaps=True. Defaults to 1.5s.
min_segment_seconds (float, optional) – Minimum segment length (seconds) to keep. Used when split_by_gaps=True. Defaults to 1.5s.
imu_config (IMUConfig, optional) – IMU preprocessing configuration.
ppg_config (PPGConfig, optional) – PPG preprocessing configuration.
gait_config (GaitConfig, optional) – Gait analysis configuration.
arm_activity_config (GaitConfig, optional) – Arm activity analysis configuration.
tremor_config (TremorConfig, optional) – Tremor analysis configuration.
pulse_rate_config (PulseRateConfig, optional) – Pulse rate analysis configuration.
logging_level (int, default logging.INFO) – Logging level using standard logging constants: - logging.ERROR: Only errors - logging.WARNING: Warnings and errors - logging.INFO: Basic progress information (default) - logging.DEBUG: Detailed debug information Can also use DETAILED_INFO (15) for intermediate detail level.
custom_logger (logging.Logger, optional) – Custom logger instance. If provided, logging_level is ignored. Allows full control over logging configuration.
- Returns:
Complete analysis results with nested structure for multiple pipelines: - ‘quantifications’: dict with pipeline names as keys and DataFrames as values - ‘aggregations’: dict with pipeline names as keys and result dicts as values - ‘metadata’: dict with pipeline names as keys and metadata dicts as values - ‘errors’: list of dicts tracking any errors that occurred during processing.
Each error dict contains ‘stage’, ‘error’, and optionally ‘file’ and ‘pipeline’. Empty list indicates successful processing of all files.
- Return type:
dict