Data Input Formats Guide
ParaDigMa’s run_paradigma() function supports multiple flexible input formats for providing data to the analysis pipeline.
Prerequisites
Before using ParaDigMa, ensure your data meets the requirements:
Sensor requirements: See Sensor Requirements
Device compatibility: See Supported Devices
Data format: Pandas DataFrame with required columns (see below)
Input Format Options
The dfs parameter accepts three input formats:
1. Single DataFrame
Use when you have a single prepared DataFrame to analyze:
import pandas as pd
from paradigma.orchestrator import run_paradigma
# Load your data
df = pd.read_parquet('data.parquet')
# Process with a single DataFrame
results = run_paradigma(
dfs=df, # Single DataFrame
pipelines=['gait'],
watch_side='right', # Required for gait pipeline
save_intermediate=['aggregation'] # Saves to ./output by default
)
The DataFrame is automatically assigned the identifier 'df_1' internally.
2. List of DataFrames
Use when you have multiple DataFrames that should be automatically assigned sequential IDs:
# Load multiple data segments
df1 = pd.read_parquet('morning_session.parquet')
df2 = pd.read_parquet('afternoon_session.parquet')
df3 = pd.read_parquet('evening_session.parquet')
# Process as list - automatically assigned to 'df_1', 'df_2', 'df_3'
results = run_paradigma(
dfs=[df1, df2, df3],
pipelines=['gait'],
watch_side='right',
save_intermediate=['quantification', 'aggregation']
)
Benefits:
Automatic segment ID assignment
Each DataFrame processed independently before aggregation
Aggregation performed across all input DataFrames
3. Dictionary of DataFrames
Use when you need custom identifiers for your data segments:
# Create dictionary with custom segment identifiers
dfs = {
'patient_001_morning': pd.read_parquet('session1.parquet'),
'patient_001_evening': pd.read_parquet('session2.parquet'),
'patient_002_morning': pd.read_parquet('session3.parquet'),
}
# Process with custom segment identifiers
results = run_paradigma(
dfs=dfs,
pipelines=['gait'],
watch_side='right',
save_intermediate=[] # No files saved - results only in memory
)
Benefits:
Custom segment identifiers in output
Improved traceability of data sources
Useful for multi-patient or multi-session datasets
Loading Data from Disk
To automatically load data files from a directory:
from paradigma.orchestrator import run_paradigma
# Load all files from a directory
results = run_paradigma(
data_path='./data/patient_001/',
pipelines=['gait'],
watch_side='right',
file_pattern='*.parquet', # Optional: filter by pattern
save_intermediate=['aggregation']
)
Supported file formats:
Pandas:
.parquet,.csv,.pkl,.pickleTSDF:
.meta+.binpairsDevice-specific:
.avro(Empatica),.cwa(Axivity)
See Supported Devices for device-specific loading examples.
Required DataFrame Columns
Your DataFrame must contain the following columns depending on the pipeline:
For Gait and Tremor Pipelines
# Required columns
df.columns = ['time', 'accelerometer_x', 'accelerometer_y', 'accelerometer_z',
'gyroscope_x', 'gyroscope_y', 'gyroscope_z']
time: Timestamp (float seconds or datetime)accelerometer_x,accelerometer_y,accelerometer_z: Accelerometer datagyroscope_x,gyroscope_y,gyroscope_z: Gyroscope data
For Pulse Rate Pipeline
# Required columns
df.columns = ['time', 'ppg'] # Accelerometer optional
time: Timestamp (float seconds or datetime)ppg: PPG/BVP signal
Custom Column Names
If your data uses different column names, rename the columns or use column_mapping:
results = run_paradigma(
dfs=df,
pipelines=['gait'],
watch_side='left',
column_mapping={
'timestamp': 'time',
'acc_x': 'accelerometer_x',
'acc_y': 'accelerometer_y',
'acc_z': 'accelerometer_z',
'gyr_x': 'gyroscope_x',
'gyr_y': 'gyroscope_y',
'gyr_z': 'gyroscope_z'
}
)
Data Preparation Parameters
If your data needs preparation (unit conversion, resampling, etc.), ParaDigMa can handle it automatically:
results = run_paradigma(
dfs=df_raw,
pipelines=['gait'],
watch_side='left',
skip_preparation=False, # Default: perform preparation
# Unit conversion
accelerometer_units='m/s^2', # Auto-converts to 'g'
gyroscope_units='rad/s', # Auto-converts to 'deg/s'
# Resampling
target_frequency=100.0,
# Time handling
time_input_unit='relative_s', # Or 'absolute_datetime'
# Orientation correction
device_orientation=['x', 'y', 'z'],
# Segmentation for non-contiguous data
split_by_gaps=True,
max_gap_seconds=1.5,
min_segment_seconds=1.5,
)
If your data is already prepared (correct units, sampling rate, column names), skip preparation:
results = run_paradigma(
dfs=df_prepared,
pipelines=['gait', 'tremor'],
watch_side='left',
skip_preparation=True
)
Output Control
Output Directory
results = run_paradigma(
dfs=df,
pipelines=['gait'],
watch_side='left',
output_dir='./results', # Custom output directory (default: './output')
)
Saving Intermediate Results
Control which intermediate steps are saved to disk:
results = run_paradigma(
dfs=df,
pipelines=['gait'],
watch_side='left',
save_intermediate=[
'preparation', # Prepared data
'preprocessing', # Preprocessed data
'classification', # Gait/tremor bout classifications
'quantification', # Segment-level measures
'aggregation' # Aggregated measures
]
)
To keep results only in memory without saving files:
results = run_paradigma(
dfs=df,
pipelines=['gait'],
watch_side='left',
save_intermediate=[] # No files saved
)
Results Structure
Regardless of input format, results are returned in the same structure:
results = {
'quantifications': {
'gait': pd.DataFrame, # Segment-level gait measures
'tremor': pd.DataFrame, # Segment-level tremor measures
},
'aggregations': {
'gait': dict, # Time-period aggregated gait measures
'tremor': dict, # Time-period aggregated tremor measures
},
'metadata': dict, # Analysis metadata
'errors': list # List of errors (empty if successful)
}
Error Tracking
The errors list contains any errors encountered during processing. Always check this after running:
if results['errors']:
print(f"Warning: {len(results['errors'])} error(s) occurred")
for error in results['errors']:
print(f" Stage: {error['stage']}")
print(f" Error: {error['error']}")
if 'file' in error:
print(f" File: {error['file']}")
Each error dict contains:
stage: Where the error occurred (loading, preparation, pipeline_execution, aggregation)error: Error messagefile: Filename (if file-specific, optional)pipeline: Pipeline name (if pipeline-specific, optional)
File Key Column
When processing multiple files, the quantifications DataFrame includes a file_key column:
Single DataFrame input: No
file_keycolumnList input (2+ files):
'df_1','df_2', etc.Dict input (2+ files): Custom keys you provided
This preserves traceability while keeping single-file results concise.
Best Practices
Single DataFrame: Use for single files or pre-aggregated data
List of DataFrames: Use when you don’t need specific naming
Dictionary of DataFrames: Use when segment identifiers are important for traceability
Check
file_keycolumn: Trace results back to input segments in multi-file processingSkip preparation: Set
skip_preparation=Trueif data is already standardizedSave selectively: Only save intermediate results you need to reduce disk usage
See Also
Sensor Requirements - What sensor specs are needed
Supported Devices - Device-specific loading examples
Data Preparation Tutorial - Step-by-step preparation guide